Path: utzoo!attcan!uunet!cs.utexas.edu!usc!ucla-cs!oahu.cs.ucla.edu!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: lets start another processor war - i860 vs RIOS Message-ID: <36258@shemp.CS.UCLA.EDU> Date: 15 Jun 90 18:03:45 GMT References: <22257@boulder.Colorado.EDU> Sender: news@CS.UCLA.EDU Distribution: comp Organization: UCLA Computer Science Department Lines: 54 In article <22257@boulder.Colorado.EDU> grunwald@foobar.colorado.edu writes: >What I'm wondering is, can they stick in more FXU's or FPU's without >changing the architecture semantics. ... >So, for people who've look at the architecture, does it look like you >could toss in another FXU and still keep precise exceptions? If not, >would the FPU/FXU blocking make the advantages of an additional FXU >moot? The Fixed-Point Unit is very limited in the amount of parallelism it can accomplished. Even though the chip set is superscalar in terms of issuing several instructions per cycle, only one fixed-point arithmetic instruction can be executed per cycle. Adding another ALU seems to be a good way to take advantage of the additional parallelism available. According to [Smith, Johnson, Horowitz ASPLOS 89], adding an ALU is more profitable than another load pipe. Of course the addition of another ALU would introduce out-of-order completions of instructions since the latencies of the different functional units are different. Then a scheme such as a result buffer would be required in order to maintain precise interrupts. The way that the IBM chip set maintains precise interrupts is simple but it also impairs performance. For example, the FXU and the FPU are interlocked in the first few pipeline stages and loads (even floating- point loads) are all handled by the FXU so that out-of-order loads do not occur. That's fine but it also means that the FXU can't decode other instructions when loads are encountered, thus a limitation. Also since parallelism can occur between the FPU and the FXU, whenever an instruction in the FXU can cause an interrupt, subsequent instructions in the FPU are blocked until the FXU finds out if there was an interrupt or not. This reduces the possible overlapping between the two units in order to make the synchronization simple. Parallelism is sacrificed for "rare" cases when interrupts occur. In future implementations, we may see a more elaborate scheme where the FPU is allowed to proceed, given that its state can be restored upon an interrupt. Notice that most of the performance-impairing problems mentioned above can be reduced by proper instruction scheduling. By mixing FXU and FPU instructions and by scheduling possible-interrupt-causing instructions in a proper way, the FPU *will not* have to wait for the FXU interrupt to be resolved (it will already be resolved by the time the FPU instruction reaches the execute stage). So we ask the question: "How often can the compiler schedule instructions in a way that the FPU and FXU are synchronized so that they don't wait on each other?" -It doesn't matter, only SPECmark counts! Marc Tremblay internet: marc@CS.UCLA.EDU UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc