Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!tektronix!orca!frip!andrew From: andrew@frip.WV.TEK.COM (Andrew Klossner) Newsgroups: comp.arch Subject: Re: Register Scoreboarding (specifically 88k) Message-ID: <3349@orca.WV.TEK.COM> Date: 16 May 89 20:53:55 GMT References: <24821@lll-winken.LLNL.GOV> <3288@orca.WV.TEK.COM> <19463@winchester.mips.COM> <170@dg.dg.com> <19661@winchester.mips.COM> Sender: nobody@orca.WV.TEK.COM Organization: Tektronix, Wilsonville, Oregon Lines: 28 > Finally, maybe somebody from 88K-land could describe how far into > out-of-order execution the 88K goes, i.e., assuming no scoreboard > block, > 1) how many instructions can be issued beyond a load that > cache-misses, or tlb-misses, or both? > 2) how many instructions beyond a stalled-FP-multiply > (for example) can you execute? [I'm a user, not a designer, of the 88k architecture.] The instruction fetch unit, the data fetch/store unit, and the two FPU pipelines (one multiply, one add/subtract/convert/divide) operate fairly independently. There is no inherent reason why a stall by one would force the others to stall, although, as John pointed out, if you stall on instruction fetch, the other units are going to starve pretty quickly. 1) Caching and MMU translation are done external to the CPU, in the CMMU (code/MMU), so cache- and TLB-misses are all the same to the CPU; they just mean that the CMMU will take a little longer to finish the load. If you don't use the load target and you don't issue another load or store, you can continue executing forever. On our system, the code- and data-CMMUs talk to the same memory bus ("M bus" in Motorola parlance), so if you take a code-cache miss during a load, the code CMMU will have to wait for the bus. 2) Again, if you don't use the FP-multiply target and you don't issue another FP-multiply, you can continue executing forever.