Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!tektronix!orca!frip!andrew
From: andrew@frip.WV.TEK.COM (Andrew Klossner)
Newsgroups: comp.arch
Subject: Re: Register Scoreboarding (specifically 88k)
Message-ID: <3349@orca.WV.TEK.COM>
Date: 16 May 89 20:53:55 GMT
References: <24821@lll-winken.LLNL.GOV> <GRUNWALD.89May9113443@flute.cs.uiuc.edu> <3288@orca.WV.TEK.COM> <19463@winchester.mips.COM> <170@dg.dg.com> <19661@winchester.mips.COM>
Sender: nobody@orca.WV.TEK.COM
Organization: Tektronix, Wilsonville, Oregon
Lines: 28

> Finally, maybe somebody from 88K-land could describe how far into
> out-of-order execution the 88K goes, i.e., assuming no scoreboard
> block,
> 	1) how many instructions can be issued beyond a load that
> 	cache-misses, or tlb-misses, or both?
> 	2) how many instructions beyond a stalled-FP-multiply
> 	(for example) can you execute?

[I'm a user, not a designer, of the 88k architecture.]

The instruction fetch unit, the data fetch/store unit, and the two FPU
pipelines (one multiply, one add/subtract/convert/divide) operate
fairly independently.  There is no inherent reason why a stall by one
would force the others to stall, although, as John pointed out, if you
stall on instruction fetch, the other units are going to starve pretty
quickly.

1) Caching and MMU translation are done external to the CPU, in the
CMMU (code/MMU), so cache- and TLB-misses are all the same to the CPU;
they just mean that the CMMU will take a little longer to finish the
load.  If you don't use the load target and you don't issue another
load or store, you can continue executing forever.  On our system, the
code- and data-CMMUs talk to the same memory bus ("M bus" in Motorola
parlance), so if you take a code-cache miss during a load, the code
CMMU will have to wait for the bus.

2) Again, if you don't use the FP-multiply target and you don't issue
another FP-multiply, you can continue executing forever.