Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!crdos1!davidsen
From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr)
Newsgroups: comp.arch
Subject: Re: Compilers and efficiency
Message-ID: <3328@crdos1.crd.ge.COM>
Date: 12 Apr 91 13:16:35 GMT
References: <27fa3350.6bc2@petunia.CalPoly.EDU> <7117@auspex.auspex.com> <10095@mentor.cc.purdue.edu>
Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen)
Organization: GE Corp R&D Center, Schenectady NY
Lines: 25

In article <10095@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:

| If a hardware polynomial evaluation takes longer than an explicit loop,
| it is not the fault of the instruction, but of the implementation.  Also,
| it is important not to compare the object codes produced by compilers, but
| by intelligent human beings, who can reason out how to use the features not
| supported by the languages.

  Obviously a bad algorithm is slow, however you implement it. A good
implementation can be faster than the best code, due to overlap of
instructions. We had someone do an FFT instruction for VAX loadable
control store (master's thesis) and he got about 15-20% over the hand
coded assembler.

  You can get somewhat the same effect on a RISC machine if you feed it
good enough code and it has register scoreboarding or other techniques
which allow overlap. If I wanted maximum speed for some operation I
would still hardcode an instruction, but you need a certain dollar
volume to justify building a special instruction into a CPU instead of
using the real estate for something else. This is why we have
coprocessors, to allow the user to buy the instructionss/he needs.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"