Path: utzoo!attcan!uunet!decwrl!apple!amdcad!mozart.amd.com!nucleus!davec From: davec@nucleus.amd.com (Dave Christie) Newsgroups: comp.arch Subject: Re: High-Priority Instructions Message-ID: <1990Jul27.161856.25701@mozart.amd.com> Date: 27 Jul 90 16:18:56 GMT References: <58428@bbn.BBN.COM> <37310@shemp.CS.UCLA.EDU> Sender: usenet@mozart.amd.com (Usenet News) Reply-To: davec@nucleus.amd.com (Dave Christie) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 48 In <37310@shemp.CS.UCLA.EDU> marc@oahu.cs.ucla.edu (Marc Tremblay) writes: >In article <58428@bbn.BBN.COM> schooler@oak.bbn.com (Richard Schooler) writes: >> [description of problems with scheduling for writeback slot on 88k deleted] >It looks like Motorola did not want to deal with functional unit >latencies when instructions are issued. >Otherwise they could use a result shift register where the "writeback slot" >is reserved in advance according to the latency of the functional unit used. >Conflicts are thus resolved in advance. Collisions cause stalling of >the issuing unit. Yep, a pretty straightforward thing to do control-wise. You also want to be able to forward either or both input operands from it, so it's a bit more than a simple shift register, but nevertheless nice for handling medium latencies. You might want to revert to a priority scheme for divide though, or serialize. Another benefit is that it keeps your register file updates in order. Might have been just a tad too much realestate for them though, assuming they considered it. >> [Richard's instruction-based priority bit scheme deleted] > >Let's see, instructions are held up because other instructions >have higher priority, they will proceed only if there is an empty slot. >If there is no empty slot, that means that other instructions are producing >useful work and that the write-back slot is running at full throughput But if a certain write slot doesn't satisfy a dependency that exists at decode then issue stalls, which will cause an empty write slot downstream and lower throughput. You would typically want to separate a FP instruction from a instruction that depended on it; if you throw as many integer instructions in between as necessary to compensate for the FP latency, those integer instructions just end up stretching out the FP latency by taking priority for writeback, and the extra FP latency isn't hidden at all - ouch! The proper priority scheme, IMHO, would give preference to earlier instructions in the sequence, which is of course the ones with the longer latency. I'd like the hear their rational for this. Maybe it was arbitrarily chosen and they just expected the compilers to keep it all moving (bet they learned a lesson there!). There's an example of rescheduling in an article on the 88K in June's IEEE Micro, but it conveniently avoids write slot contention with an appropriate mix of various latency operations. This of course does happen in the real world, but there's no mention of what happens when things don't work out so nicely. In any case I personally wouldn't advocate putting something in the instruction set architecture simply to occasionally deal with the vagaries of a particular implementation. ---------------------- Dave Christie My opinions only.