Xref: utzoo comp.arch:17122 comp.compilers:1039 Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!yale!mintaka!snorkelwacker!spdcc!esegue!compilers-sender From: mash@mips.COM (John Mashey) Newsgroups: comp.arch,comp.compilers Subject: Re: Register Allocation and Aliasing (really: zillions of transistors) Keywords: optimize Message-ID: <1990Jul14.223948.13994@esegue.segue.boston.ma.us> Date: 14 Jul 90 22:39:48 GMT References: <1990Jul06.194618.4957@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: mash@mips.COM (John Mashey) Followup-To: comp.arch Organization: MIPS Computer Systems, Inc. Lines: 69 Approved: compilers@esegue.segue.boston.ma.us In article <1990Jul06.194618.4957@esegue.segue.boston.ma.us> rfg@ncd.com (Ron Guilmette) writes: >> Hare brained idea: allocate quantities that *might* be aliased to >>registers anyway. Provide a register to contain the true memory >>address of the aliased quantity, which causes a trap when the address >>is accessed (or automagically forwards to/from the register). Not >>only are aliasing problems avoided, but you've got a set of data >>address breakpoint registers as well! (ie. this technique could be >>experimentally evaluated on machines that have data address >>breakpoints). Some of this sounds interesting, and some may be useful in the future, for various applications. However, one must be careful, especially in a world of LIW, super-scalar, super-pipelined, and super-scalar-super- pipelined multiple-issue machines (i.e., all RISCs that expect to be competitive in the next few years), that you don't stick something in a critical path that blows your cycle time by 50%.... Maybe this is a good time to expound a little on a related widespread fantasy chat might be called: When You Have A Zillion Transistors On A Chip, All Of Your Problems Go Away. Most of the following is over-simplified discussion of EE stuff from a software guy's viewpoint; maybe some real VLSI types will corect goofs and expound more on this topic: It is clear that more space on a die help a lot, and they let you do things like: bigger on-chip caches a wonderful thing: regular, dense, and transistors, rather than wires this includes: I-caches, D-caches, TLBs, branch-target buffers, pre-decoded instruction buffers, etc. monster-fast FP units and other arithmetic units for some kinds of units (like multipliers), more space ==> faster, reduces latency of operation, always a good thing. more copies of functional units, or more pipelining increases the repeat rate for an operation, which may help some kinds of things. wider busses, increasing intra-chip bandwidth. On the other hand, there are some nasty facts of life (for CMOS, anyway): 1) FAN-OUT is not free. Put another way, the more loads on a bus, the slower it is. Bigger transistors help, up to a point, but what usually happens is that you must cascade the gates to keep the total delay minimized. 2) FAN-IN is not free either. 3) WIRES DON'T SHRINK as fast as transistors (because the resistance increases as they get narrower). Hence, as you do shrinks to increase the speed, and get more on a chip, this means the wires can gobble up more of the space. Put another way: 1) The more things listening to you, the slower you are. 2) The more things you listen to, the slower you are. 3) Don't think you can run monster busses all over the place for free. All of this says that people STILL have to think very hard about delays in the critical paths in a CPU. The faster you go, the more you're likely to be doing more things in parallel, but if you're not careful, these factors can bite you badly, especially in a single-chip design that can only dissipate so much heat. -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.