Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!amdcad!crackle!tim From: tim@crackle.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Register usage Message-ID: <25786@amdcad.AMD.COM> Date: 30 May 89 15:53:17 GMT References: <259@mindlink.UUCP> <25382@ames.arc.nasa.gov> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc. Sunnyvale CA Lines: 35 Summary: Expires: Sender: Followup-To: In article mcg@mipon2.UUCP (Steven McGeady) writes: | One thing that no one has yet pointed out is that a reason not to implement | huge directly-addressable register files is that, in any reasonable | implementation, the register file must be multi-ported. A six-ported register | cell is about 5x the size of a single-ported register cell or a cache RAM | cell. To more fully utilize micro-parallelism in an architecture, more | sources and results need to be fetched from the register file simultaneously, | thus the additional ports. I don't see why this is a reason not to implement large register files. You need to apply transistors where they will do the most good. Note that processors that attempt to "more fully utilize micro-parallelism" also tend to want to have more general-purpose registers available to maintain full performance. | This, IMHO, is one of the greatest flaws of the 29k - it exposes 192 | (actually 256) architectural registers. In the current implementation they | are (I believe) 3-ported, and even now occupy a large amount of the 29k | die space. I believe that they will run into serious problems if they | ever attempt to dispatch and execute multiple general instructions per cycle. Well, we see no problems in either our 2nd or 3rd generation parts... | The 960, on the other hand, exposes 32 general registers architecturally, | but because 16 of these are "local", and saved/restored on call/return | to/from architecurally-hidden resources, we can easily move from a cheap | (1-ported by 4 sets) register implementation, to a very fast one | (6-ported by 8+ sets) in high-performance implementations. So you *will* be looking at large register files (128+, 6-ported) for high performance. -- Tim Olson Advanced Micro Devices (tim@amd.com)