Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!amdcad!crackle!tim From: tim@crackle.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Register usage [was Re: 80486 vs. 68040 code size] Message-ID: <25559@amdcad.AMD.COM> Date: 9 May 89 00:08:44 GMT References: <907@aber-cs.UUCP> <25546@amdcad.AMD.COM> <25127@ames.arc.nasa.gov> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Distribution: eunet,world Organization: Advanced Micro Devices, Inc. Sunnyvale CA Lines: 42 Summary: Expires: Sender: Followup-To: In article <25127@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: | In article <25546@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: | > | >The Am29000 calling convention says that global temporary registers are | >killed across a procedure call, while values in local registers remain | >alive. | | This certainly is a good way to do it. Is the terminology you | use for local and global standard? (It is the opposite of what I have | usually heard before, and seems counterintuitive - it seems that the | temporaries should be local temporaries, although as you have used it | the registers are "global temporary registers" because they are assigned | to handle temporaries in all procedures- hence "global".) The terms "local" and "global" refer to the different types of registers on the Am29000 -- there are 64 global registers, which are addressed absolutely, and 128 local registers, which are addressed relative to an internal stack pointer (used to implement a software stack cache). 32 of the globals are reserved for the OS for important things (like TLB miss page table pointers, etc.); there are 24 globals reserved for temporary registers -- the rest are for stack pointers, etc. | >A static analysis of 495 functions shows that an average of 6.6 global | >registers and an average of 7.0 local registers are used per function, | : | > 11: 2.83% (14) 11: 3.03% (15) | : | | An interesting set of figures. Using 32 general purpose registers, with | 16 for local and 16 for temporaries, would certainly seem to fit, given where | the knee of the curve is. Except for the constant saving and restoring of live variables which occurs if you don't have stack cache hysterisis... Also, techniques used to uncover more parallelism, such as loop unrolling, software pipelining, and trace-scheduling tend to require many more registers. -- Tim Olson Advanced Micro Devices (tim@amd.com)