Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!ames!oliveb!mipos3!blabla!kds From: kds@blabla.intel.com (Ken Shoemaker) Newsgroups: comp.arch Subject: Re: Register Scoreboarding Message-ID: <113@mipos3.intel.com> Date: 17 May 89 17:36:29 GMT References: <24821@lll-winken.LLNL.GOV> <3288@orca.WV.TEK.COM> <19463@winchester.mips.COM> <170@dg.dg.com> <19661@winchester.mips.COM> Sender: news@mipos3.intel.com Reply-To: kds@blabla.UUCP (Ken Shoemaker) Organization: Santa Clara Microprocessor Division, Intel Corp., Santa Clara, CA Lines: 52 Perhaps I'm missing something here, but I'd like to make a few observations: 1) a large register array seems to be useful as a user-managed data cache 2) caches, in general, don't freeze up instruction execution while doing replaces (or, at least, they don't need to) 3) register scoreboarding provides for very simple support for cache (i.e., register) reloads by an autonomous unit (kicked off by the processor, but not requiring exclusive use of the resource until the operation is complete). This requires that the "register reloader" has a write port to the registers aside from the processor write port, otherwise you have a resource constraint. I don't think this approach is appropriate to the MIPS machines because of their bus organization, i.e., they have a resource constraint in the use of the external data bus. But to paraphrase John's comments, such a machine, with the register reloader autonomous functional unit, splits the operation of loading the registers into two functional units whereas most current machines use a single functional unit (the execution unit). If you have two such units, it makes no sense to keep the execution unit frozen until you actually have the registers reloaded, much like it doesn't make sense to keep the floating point unit frozen while doing integer memory loads. It would also have consequences for code reorganization. Procedures, or maybe even whole programs (because, like Henry Spencer says, compilers are real smart these days), would have a preamble which would load the entire complement of variables into the registers. The concept of "load delay slots" really becomes a don't care, because the registers are loaded long before they are used. This is really just a smart data prefetch algorithm driven by software (imagine that). The same kind of thing can be done to force replaces in the external cache so that by the time you get around to using a piece of data, it will be a cache hit and not require the latency time to system memory. Or course, this assumes that your register space isn't sufficiently large to hold all the variables that are going to be used. I won't even worry about volatile variables. They are such infrequent cases as to be a don't cares. For the time being, assume that there are no problems with multi-processor data consistancy between the registers, ne user-managed data cache, of different processors or I/O devices. I'm sorry if this is obvious. It certainly seems so to me, so there must be something I am not seeing. I'd appreciate if someone could straighten me out. But as I am leaving for two months at the end of the week, and as we don't keep news around on the system that long, I'd appreciate a note in my mail instead. Thanks! ---------- I've decided to take George Bush's advice and watch his press conferences with the sound turned down... -- Ian Shoales Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California uucp: ...{hplabs|decwrl|pur-ee|hacgate|oliveb}!intelca!mipos3!kds