Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: Compilers and efficiency Message-ID: <3328@crdos1.crd.ge.COM> Date: 12 Apr 91 13:16:35 GMT References: <27fa3350.6bc2@petunia.CalPoly.EDU> <7117@auspex.auspex.com> <10095@mentor.cc.purdue.edu> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 25 In article <10095@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: | If a hardware polynomial evaluation takes longer than an explicit loop, | it is not the fault of the instruction, but of the implementation. Also, | it is important not to compare the object codes produced by compilers, but | by intelligent human beings, who can reason out how to use the features not | supported by the languages. Obviously a bad algorithm is slow, however you implement it. A good implementation can be faster than the best code, due to overlap of instructions. We had someone do an FFT instruction for VAX loadable control store (master's thesis) and he got about 15-20% over the hand coded assembler. You can get somewhat the same effect on a RISC machine if you feed it good enough code and it has register scoreboarding or other techniques which allow overlap. If I wanted maximum speed for some operation I would still hardcode an instruction, but you need a certain dollar volume to justify building a special instruction into a CPU instead of using the real estate for something else. This is why we have coprocessors, to allow the user to buy the instructionss/he needs. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"