Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ncar!noao!nud!tom From: tom@nud.UUCP (Tom Armistead) Newsgroups: comp.arch Subject: RISC machines and scoreboarding Message-ID: <1082@nud.UUCP> Date: 17 Jun 88 19:39:23 GMT Reply-To: tom@nud.UUCP (Tom Armistead) Organization: Motorola Microcomputer Division, Tempe, Az. Lines: 62 On RISC processors without a scoreboard*, how are the results of memory references guaranteed to be available before they are used in subsequent computations? I thought through several scenarios and couldn't find a very good solution and being being unfamiliar with RISCs other than the 88k, thought the net might have some answers. 1) If the processor waited until the load was complete before launching the next instruction, that would guarantee the correct result was available before an attempt was made to use it. This obviously is very poor from a performance standpoint though. 2) If the compiler/assembly writer is required to wait a certain number of ticks after a load before using the results of the load, how is the required amount of delay determined? Can you assume you always get a cache hit (I doubt it)? Do you have to use worst case (then why use a cache at all)? What if the access is across a shared bus and the tape controller is currently hogging the bus and will be for the next 10ms (you certainly wouldn't want to use a 10ms delay after each load)? Does the memory system have to give absolute top priority to CPU requests so that the required delay is fixed? What if memory refresh were also pending - it seems here you would have a nasty choice of letting memory be trashed or letting the CPU use bogus results? Does #2 mean that a binary compiled for a particular system won't run on a different system (having a different memory speed) even though they have the same CPU chip and that instead, the source must be recompiled for each different system the program is to run on (or if the current system is sped up/slowed down)? This seems like it would be clumsy and difficult to manage from a software distribution aspect. 3) Is there some internal register that can be referenced that indicates when the result is available and software is required to interogate the register to determine if the result is ready? This would add overhead to every load instruction. 4) Is there some other method I haven't thought of that is the method of choice? Which RISC processors don't have scoreboarding and what method do they use? * For those unfamiliar with what scoreboarding is, the scoreboard is merely an internal register which keeps track of which registers have stale data (for example the destination register of a load instruction which has been launched but not completed). If the CPU encounters an instruction which uses a register which is stale, it waits until the scoreboard indicates the register is valid again before it begins the instructions execution thus assuring the correct operands are used by the instruction. Note this is a hardware feature - software is not required to examine the scoreboard to accomplish this. As an example, on the 88k, you could do the following sequence: ld r2,r0,0 ; Assume this ld takes a long time. add r3,r2,1 ; Add 1 to result of ld and put in r3. and the result would be correct (for any length of ld time). Without a scoreboard, the result in r3 would be based on the stale data in r2 (which is probably the wrong value). -- Just a few more bits in the stream. The Sneek