Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!ucla-cs!oahu.cs.ucla.edu!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: High-Priority Instructions Message-ID: <37503@shemp.CS.UCLA.EDU> Date: 31 Jul 90 00:57:22 GMT References: <37310@shemp.CS.UCLA.EDU> <1990Jul27.161856.25701@mozart.amd.com> <3612@yogi.oakhill.UUCP> Sender: news@CS.UCLA.EDU Organization: UCLA Computer Science Department Lines: 63 In article <3612@yogi.oakhill.UUCP> marvin@yogi.UUCP (Marvin Denman) writes: >In <37310@shemp.CS.UCLA.EDU> marc@oahu.cs.ucla.edu (Marc Tremblay) writes: >>It looks like Motorola did not want to deal with functional unit >>latencies when instructions are issued. >>Otherwise they could use a result shift register where the "writeback slot" >>is reserved in advance according to the latency of the functional unit used. >>Conflicts are thus resolved in advance. Collisions cause stalling of >>the issuing unit. > >I think that posters proposal of a shift register was intended for writeback >reservations only, but I may have misunderstood. At the time we designed the >88100 we did not seriously consider a shift register of results which is what >Dave Christie seems to be talking about because of circuit complexities. We >definitely considered a shift register for write back result reservations, but >due to several considerations we decided that the arbitration scheme was more >flexible. You are right, my original posting concerned a shift register for write back result reservation. >Inserting only integer >instructions between when the fp operation starts and is used will delay >the fp result, but it does hide all of the latency except the 1 extra clock >that the sequencer stalls to let the data write back before it is used. Even >more important is that in most realistic code the instructions that you can >insert will have some loads or maybe even branches that will free up writeback ^^^^^^^^ pretty weird code! >slots. Of course if you do not have enough independent instructions to >completely hide the latency it does not matter which scheme you use because >you will be waiting for the data anyway. Good point. ... [Marvin explains the rationale behind the prioritization scheme for the 88100] Thanks for giving us the "inside" point of view. A variation of another arbitration scheme described in Mike Johnson's dissertation "Super-Scalar Processor Design" (Stanford 89), could be used to avoid starvation for floating-point results. They simulated an arbiter that gives priority as follows: - there are two arbiters, one for integer results, one for floating-point results. - top priority is given to requests that have been active for more than one cycle (functional units make their request one cycle before they finish). - integer: top: ALU floating: top: add Shifter multiply branch (return address) divide bottom: loads bottom: convert In this way, long latencies will be avoided since old requests are treated first. Johnson claims that the added complexity for such an arbiter (compared to one that does not involve time) is small, but of course no layouts metrics are given... _________________________________________________ Marc Tremblay internet: marc@CS.UCLA.EDU UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc