Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!ucla-cs!oahu.cs.ucla.edu!marc
From: marc@oahu.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: High-Priority Instructions
Message-ID: <37503@shemp.CS.UCLA.EDU>
Date: 31 Jul 90 00:57:22 GMT
References: <37310@shemp.CS.UCLA.EDU> <1990Jul27.161856.25701@mozart.amd.com> <3612@yogi.oakhill.UUCP>
Sender: news@CS.UCLA.EDU
Organization: UCLA Computer Science Department
Lines: 63

In article <3612@yogi.oakhill.UUCP> marvin@yogi.UUCP (Marvin Denman) writes:
>In <37310@shemp.CS.UCLA.EDU> marc@oahu.cs.ucla.edu (Marc Tremblay) writes:
>>It looks like Motorola did not want to deal with functional unit
>>latencies when instructions are issued.
>>Otherwise they could use a result shift register where the "writeback slot"
>>is reserved in advance according to the latency of the functional unit used.
>>Conflicts are thus resolved in advance. Collisions cause stalling of
>>the issuing unit.
>
>I think that posters proposal of a shift register was intended for writeback 
>reservations only, but I may have misunderstood. At the time we designed the
>88100 we did not seriously consider a shift register of results which is what
>Dave Christie seems to be talking about because of circuit complexities.  We 
>definitely considered a shift register for write back result reservations, but 
>due to several considerations we decided that the arbitration scheme was more
>flexible.

You are right, my original posting concerned a shift register for write
back result reservation.

>Inserting only integer
>instructions between when the fp operation starts and is used will delay
>the fp result, but it does hide all of the latency except the 1 extra clock
>that the sequencer stalls to let the data write back before it is used.  Even
>more important is that in most realistic code the instructions that you can
>insert will have some loads or maybe even branches that will free up writeback 
                                           ^^^^^^^^
					   pretty weird code! 

>slots.  Of course if you do not have enough independent instructions to 
>completely hide the latency it does not matter which scheme you use because
>you will be waiting for the data anyway.

Good point.

...
[Marvin explains the rationale behind the prioritization scheme for the 88100]

Thanks for giving us the "inside" point of view.
A variation of another arbitration scheme described in
Mike Johnson's dissertation "Super-Scalar Processor Design" (Stanford 89),
could be used to avoid starvation for floating-point results.
They simulated an arbiter that gives priority as follows:

	- there are two arbiters, one for integer results,
  	  one for floating-point results.
	- top priority is given to requests that have been
	  active for more than one cycle (functional units
	  make their request one cycle before they finish).
	- integer: top: ALU			 floating: top: add
			Shifter					multiply
			branch (return address)			divide
		bottom:	loads				bottom:	convert

In this way, long latencies will be avoided since old requests are treated
first. Johnson claims that the added complexity for such an arbiter
(compared to one that does not involve time) is small, but of course
no layouts metrics are given...

_________________________________________________
Marc Tremblay
internet: marc@CS.UCLA.EDU
UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc