Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!agate!bionet!apple!bloom-beacon!tut.cis.ohio-state.edu!mailrus!ncar!tank!uxc!uxc.cso.uiuc.edu!urbsdc!aglew From: aglew@urbsdc.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: VLIW (was please re-send mail) Message-ID: <28200228@urbsdc> Date: 5 Nov 88 21:24:00 GMT References: <70@armada.UUCP> Lines: 71 Nf-ID: #R:armada.UUCP:70:urbsdc:28200228:000:3666 Nf-From: urbsdc.Urbana.Gould.COM!aglew Nov 5 15:24:00 1988 >The 88000 does even better. Rather than requiring all instructions to >contain several operations, each instruction *starts* one operation. The >target register (source for a store operation) is marked "busy" so that >the the next reference will wait for the operation to finish. This >allows the same parallelism as VLIW without wasting code memory bandwidth >on empty slots. From what I understand, the current chip does addition >and logic in one instruction cycle (thus not parallelizing these operations), >but load, store, multiply/divide, floating point use the scheme described. >A neat advantage of the hardware bit is that the compiler does not need to >know exact timings to ensure correct execution. Timing data enhances >optimization, but is not necessary to ensure correctness. > >I believe this technique is called "scoreboarding". > >A later version could parallelize short instructions also if instruction >fetching became much faster than addition and logic. > >Stuart D. Gathman > <..!{vrdxhq|daitc}!bms-at!stuart> I have to be careful saying this, since I now work for Motorola, but it should be obvious that scoreboarding cannot take you as far as VLIW. Scoreboarding is an appropriate choice for the current level of microprocessor technology, but any computer architect will tell you that you eventually have to get past the one operation/cycle dispatch limit (well, maybe not Norm Jouppi, at DEC, who published an interesting paper titled something like "Superpipelined vs. Superparallel" computers in CAN a while back). Scoreboarding lets you have multiple operations at once, but still, typically, you only dispatch one operation/instruction cycle. Which means that only one operation/instruction cycle can complete, which provides a limit on throughput. To get faster, you either have to decrease the cycle time or increase the number of operations dispatched/completed per instruction cycle. Note that scoreboarding doesn't even get you to 1 operation/cycle dispatch; you still have stalls, when the register is busy. The next step past scoreboarding is Tomasulo instruction scheduling, which lets you continue to dispatch instructions even though previous instructions have not yet even received the data to begin execution. Berkeley's Aquarius project was the last group I know of to try this. Tomasulo scheduling seems to be a hard subject, but every group to try it makes it a little bit easier. Tomasulo on a single operation per instruction set lets you approach 1 operation / instruction cycle dispatch. Both scoreboarding and Tomasulo can be used to dispatch one or multiple instructions per cycle, getting past the instruction dispatch limit. This is just easier to do in a VLIW instruction set, where the operations are guaranteed to be independent; it can be done, but gets expensive, for dispatch of multiple possibly dependent operations/cycle. Andy "Krazy" Glew. at: Motorola Microcomputer Division, Champaign-Urbana Development Center (formerly Gould CSD Urbana Software Development Center). mail: 1101 E. University, Urbana, Illinois 61801, USA. email: (Gould addresses will persist for a while) aglew@gould.com - preferred, if you have MX records aglew@fang.gould.com - if you don't ...!uunet!uiucuxc!ccvaxa!aglew - paths may still be the only way My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products. PS. I promise to shorten this .signature as soon as our new mail paths are set.