Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!udel!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Vector processors, i860 Message-ID: <11820@pt.cs.cmu.edu> Date: 7 Feb 91 18:00:44 GMT References: <1991Feb4.194521.8384@cs.uiuc.edu> <11798@pt.cs.cmu.edu> Organization: Carnegie-Mellon University, CS/RI Lines: 34 In article <11798@pt.cs.cmu.edu> I wrote: >In article <1991Feb4.194521.8384@cs.uiuc.edu> > gillies@cs.uiuc.edu (Don Gillies) writes: >>...the IBM 6000 can issue three instructions of the *same* kind at >>the same time (i.e. FPU, FPU, FPU). > >I don't believe that this is correct. The IBM can (peak) issue *four* >instructions per clock, but they have to be of the four different >kinds that the machine distinguishes. > >There is only one bus from the I-cache/despatcher to the FPU. At >peak, one FPU instruction travels over it, and is queued in the FPU >for actual execution. Evidently I misspoke. There are two buses from the I-cache/despatcher to the FPU and FXU (integer unit). IBM paid the pins to send both buses to both units. So, you really can issue two FPU instructions per clock - or two FXUs - or one of each. The queue in each execution unit can dequeue/initiate one per clock, but can enqueue two per clock. For comparison, the Omron Luna on my desk can initiate four instructions per clock, in any mix. That's a cheat: it contains four 88000's. For some applications (such as mine, this week), this is actually better, because it gives a different balance of resources - mostly, for me, a big CPU-cache bandwidth. It was fun, the first time I did a process list, and saw four entries with %CPU at 98+%. The big issue with high-end processors is keeping them fed. The R4000 press release "disclosed" 128 bits of data path to the external cache: I expect several announcements this year that are at least as wide. -- Don D.C.Lindsay .. temporarily at Carnegie Mellon Robotics