Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!sun-barr!sun!imagen!atari!portal!cup.portal.com!bcase From: bcase@cup.portal.com (Brian bcase Case) Newsgroups: comp.arch Subject: Re: Register usage Message-ID: <18965@cup.portal.com> Date: 30 May 89 17:32:38 GMT References: <259@mindlink.UUCP> <25382@ames.arc.nasa.gov> Organization: The Portal System (TM) Lines: 112 >A six-ported register cell is about 5x the size of a single-ported >register cell or a cache RAM cell. This could be, depending on technology details, of course. (The width of metal dominates, usually?) >This, IMHO, is one of the greatest flaws of the 29k - it exposes 192 >(actually 256) architectural registers. In the current implementation they >are (I believe) 3-ported, and even now occupy a large amount of the 29k >die space. I believe that they will run into serious problems if they >ever attempt to dispatch and execute multiple general instructions per cycle. I can speak to this point with a little authority. The register file in the current implementation is indeed 3-ported. I must admit that I have had second thoughts about making the register file so big. The advantages are many, but the cost of additional ports is indeed bigger than for other architectures (boy, the '386 architecture has a leg up here! :-). The current 29K implementation has about 1/5 of the usable (i.e., non-pad ring) die area dedicated to the register file using 1.26 (Dave Witt: what is it really?) micron technology. At about 1 micron, that will shrink to about 1/9 of the usable area, and at .8 micron, about 1/11. Increasing the number of ports to 6, lets say, will increase the size about a factor of 1-2/3 (3x a single-ported cell to (using Steve's number) about 5x a single-ported cell). Thus, at 1 micron, the register file will use about 18% of the useable die area, and at .8 micron about 15% of the area. This is not an insignificant amount of area, but it is not "too" much in my opinion. Why? Because having lots of registers IS GOOD. If the 29K spends more area on a 6-ported data storage resource than other processors, I think it's an advantage as long as its not taking "too" much area! Up to a point, I would rather spend area on a 6-ported resource than a 1-ported resource (cache). I guess we are talking about where is the "point" in "up to a point." On the other hand, a 6-ported, 32-register file would be about (I am guessing by simple scaling) 2% to 5% of the usable die area. On the 960, some of the 13% difference is used for the register file "backing store", but not much, say another 3% (I dunno what it really is). So, the 960 has a 10% "bonus" chunk of die area. What can be done with it? At some points on the technology curve, it will have a larger cache than the 29K at the same point. At some other points, the 10% diea area will "only" result in a smaller die because 10% isn't enough to increase the size of a cache or an FP somethingorother (but 10% smaller die are cheaper die, depending on yield, greediness,etc.). So the 960 seems to have a slight implementation advantage. What's the real difference? I don't know because 10% is a small enough amount that it can be lost in the "noise" of implementation!: If one guy uses automatic tools and the other uses full custom deisgn/layout, some difference will result. Also, just due to "the way things are" the 10% might not be usable. Some die are square while some are rectangular because of "the ways things are." I am making this argument in full recognition of an argument about CISCs that I used to believe: "CISCs will always have an implementation disadvantage becuase of the microcode ROM." This is bogus for the same reason: as the technology improves, the ROM itself shrinks until it can no longer be seen with the naked eye! What constitutes either a current RISC processor or a current CISC processor (the PROCESSOR pipeline not the caches, TLBs, etc.) would be a very small corner on the die if implemented in the technology of 1995. However, what constitutes a current RISC or CISC processor will be wholely uninteresting in 1995. For the implementations to come, including superscalar stuff, the issues will be the complexity of implementation and the cost (read: people) of realizing that implementation. This is where CISCs, I believe, will faulter. One of the great advantages of RISCs is that they are conceptually easier to implement. This effect is compounded with increasing ambitions for greater performance: the fewer interactions between instructions the easiser a multi-instruction-per-cycle implementation will be to construct. In 1995, there might be another reactionary simplification movment in computer architecture; maybe current RISCs are too complex! Too many special cases! But I digress... Size will still matter, but I believe it will be dominated by the many connections (buses) and small structures required to handle the special cases and resource interactions, not the larger, regular structures like register files and caches (although we will still be trying to fit as much cache as possible). Thus, the 14-ported 192-register file of the 29K will still be larger than the 14-ported 32-register files of the 960, MIPS, i860, etc., but it won't matter because the 29K's register file will be 1% of the die while the 960's will be 0.1%. (BTW, the register file will be the center of the processor, not at one end of the data path as it is now.) So I believe. >The 960, on the other hand, exposes 32 general registers architecturally, >but because 16 of these are "local", and saved/restored on call/return >to/from architecurally-hidden resources, we can easily move from a cheap >(1-ported by 4 sets) register implementation, to a very fast one >(6-ported by 8+ sets) in high-performance implementations. This indeed gives more flexibility in implementation choices. The 29K's register file gives more flexibility in choices for using the register file. Only time will tell if more advantage is gained from having implementation choices or from use choices. If the belief that, in the end, business issues dominate, maybe the business issue of cost is more important. >The cut line is different on every architecture - 32 is sufficient on the >960, but I am not disagreeing with Wall's estimate of 64. Certainly >in floating-point intensive scientific applications dominated by >double-precision arithmetic in loops, more registers are needed. But >substantialy more than 64 seems to limit architectural flexibility >quite severely. That should be "more than 64 seems to limit *implementation* flexibility." The 29K has more architectural flexibility (the register file can be used as a stack cache, a flat pool of 192 registers, or as a few pools of a smaller number of registers. Is this important? I dunno yet.). These are just a few of my opinions mixed up with some pseudo-facts. Don't believe any of them! "I did it my way." - Sinatra.