Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!cernvax!chx400!bernina!neptune!inf.ethz.ch!brandis From: brandis@inf.ethz.ch (Marc Brandis) Newsgroups: comp.arch Subject: Re: Anything wrong with the i860 Message-ID: <28846@neptune.inf.ethz.ch> Date: 21 May 91 07:22:09 GMT References: <1991May16.221437.10751@rice.edu> <848@llnl.LLNL.GOV> <1991May17.143025.24242@rice.edu> Sender: news@neptune.inf.ethz.ch Reply-To: brandis@inf.ethz.ch (Marc Brandis) Organization: Departement Informatik, ETH, Zurich Lines: 37 In article <1991May17.143025.24242@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: >>That the chip is so difficult to compile for indicates a >>poorly designed architecture. > >Perhaps so. But weren't vectors machines considered difficult targets >for years (aren't they still)? And consider the difficulties >compilers have had with parallel machines of all sorts. >The 860's just a new and interesting problem. In other words: The hardware designers guarantee that compiler researchers still have something to do by making a bad design from time to time. -:) It was always my impression that a good architecture tried to balance the things done in hardware and the things done in software, to get the most out of current technology in both fields. The market does not care whether somebody may eventually master the hurdle to write a compiler that achieves good speed on the i860, the i860 has to stand against current architecture/ compiler pairs, not against the ones in 1995. >The RS/6000 and HP-snakes spend more hardware on the implementation >and are able to have a cleaner and simpler architecture (an approach >I support), but the 860 is still faster in some cases >(say, multiplication of large matrices). They spend more hardware, and they get more performance. The RS/6000 FP hardware it three times as fast as the one in the i860 (both the adder and the multiplier run in one cycle, while on the i860 they need three). If you have completely vectorizable code, you can get one add and one multiply per cycle out of both the RS/6000 and the i860. Note that the startup time for such a pipelined loop is pretty high on the i860. Anyway, could you please explain how the i860 should be faster on matrix multiply than the RS/6000, assuming both run at the same clock rate? Marc-Michael Brandis Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology) CH-8092 Zurich, Switzerland email: brandis@inf.ethz.ch