Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!cernvax!chx400!bernina!neptune!inf.ethz.ch!brandis
From: brandis@inf.ethz.ch (Marc Brandis)
Newsgroups: comp.arch
Subject: Re: Anything wrong with the i860
Message-ID: <28846@neptune.inf.ethz.ch>
Date: 21 May 91 07:22:09 GMT
References: <dank.674423728@blacks> <1991May16.221437.10751@rice.edu> <848@llnl.LLNL.GOV> <1991May17.143025.24242@rice.edu>
Sender: news@neptune.inf.ethz.ch
Reply-To: brandis@inf.ethz.ch (Marc Brandis)
Organization: Departement Informatik, ETH, Zurich
Lines: 37

In article <1991May17.143025.24242@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes:
>>That the chip is so difficult to compile for indicates a
>>poorly designed architecture.
>
>Perhaps so.  But weren't vectors machines considered difficult targets
>for years (aren't they still)?  And consider the difficulties
>compilers have had with parallel machines of all sorts.
>The 860's just a new and interesting problem.

In other words: The hardware designers guarantee that compiler researchers
still have something to do by making a bad design from time to time. -:)
It was always my impression that a good architecture tried to balance the
things done in hardware and the things done in software, to get the most out
of current technology in both fields. The market does not care whether 
somebody may eventually master the hurdle to write a compiler that achieves
good speed on the i860, the i860 has to stand against current architecture/
compiler pairs, not against the ones in 1995.

>The RS/6000 and HP-snakes spend more hardware on the implementation
>and are able to have a cleaner and simpler architecture (an approach
>I support), but the 860 is still faster in some cases
>(say, multiplication of large matrices).

They spend more hardware, and they get more performance. The RS/6000 FP 
hardware it three times as fast as the one in the i860 (both the adder and
the multiplier run in one cycle, while on the i860 they need three). If you
have completely vectorizable code, you can get one add and one multiply per cycle
out of both the RS/6000 and the i860. Note that the startup time for such a
pipelined loop is pretty high on the i860. Anyway, could you please explain
how the i860 should be faster on matrix multiply than the RS/6000, assuming
both run at the same clock rate?


Marc-Michael Brandis
Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology)
CH-8092 Zurich, Switzerland
email: brandis@inf.ethz.ch