Path: utzoo!attcan!uunet!decwrl!apple!amdcad!mozart.amd.com!nucleus!davec
From: davec@nucleus.amd.com (Dave Christie)
Newsgroups: comp.arch
Subject: Re: High-Priority Instructions
Message-ID: <1990Jul27.161856.25701@mozart.amd.com>
Date: 27 Jul 90 16:18:56 GMT
References: <58428@bbn.BBN.COM> <37310@shemp.CS.UCLA.EDU>
Sender: usenet@mozart.amd.com (Usenet News)
Reply-To: davec@nucleus.amd.com (Dave Christie)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 48

In <37310@shemp.CS.UCLA.EDU> marc@oahu.cs.ucla.edu (Marc Tremblay) writes:
>In article <58428@bbn.BBN.COM> schooler@oak.bbn.com (Richard Schooler) writes:
>> [description of problems with scheduling for writeback slot on 88k deleted]
>It looks like Motorola did not want to deal with functional unit
>latencies when instructions are issued.
>Otherwise they could use a result shift register where the "writeback slot"
>is reserved in advance according to the latency of the functional unit used.
>Conflicts are thus resolved in advance. Collisions cause stalling of
>the issuing unit.

Yep, a pretty straightforward thing to do control-wise.  You also want
to be able to forward either or both input operands from it, so it's a
bit more than a simple shift register, but nevertheless nice for handling 
medium latencies.  You might want to revert to a priority scheme for
divide though, or serialize.  Another benefit is that it keeps your 
register file updates in order.  Might have been just a tad too much 
realestate for them though, assuming they considered it.

>> [Richard's instruction-based priority bit scheme deleted]
>
>Let's see, instructions are held up because other instructions
>have higher priority, they will proceed only if there is an empty slot.
>If there is no empty slot, that means that other instructions are producing
>useful work and that the write-back slot is running at full throughput

But if a certain write slot doesn't satisfy a dependency that exists at
decode then issue stalls, which will cause an empty write slot downstream
and lower throughput.  You would typically want to separate a FP instruction
from a instruction that depended on it; if you throw as many integer
instructions in between as necessary to compensate for the FP latency,
those integer instructions just end up stretching out the FP latency
by taking priority for writeback, and the extra FP latency isn't hidden
at all - ouch!  The proper priority scheme, IMHO, would give preference
to earlier instructions in the sequence, which is of course the ones with
the longer latency.  I'd like the hear their rational for this.  Maybe
it was arbitrarily chosen and they just expected the compilers to keep
it all moving (bet they learned a lesson there!).  There's an example
of rescheduling in an article on the 88K in June's IEEE Micro, but it
conveniently avoids write slot contention with an appropriate mix of 
various latency operations. This of course does happen in the real world,
but there's no mention of what happens when things don't work out so nicely.

In any case I personally wouldn't advocate putting something in the 
instruction set architecture simply to occasionally deal with the 
vagaries of a particular implementation.

----------------------
Dave Christie           My opinions only.