Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!hao!boulder!sunybcs!bingvaxu!leah!itsgw!batcomputer!pyramid!prls!mips!hansen From: hansen@mips.UUCP (Craig Hansen) Newsgroups: comp.arch Subject: Re: register windows Message-ID: <833@mips.UUCP> Date: Sat, 24-Oct-87 17:53:09 EST Article-I.D.: mips.833 Posted: Sat Oct 24 17:53:09 1987 Date-Received: Mon, 26-Oct-87 05:38:46 EST References: <201@PT.CS.CMU.EDU> <933@cpocd2.UUCP> <821@mips.UUCP> <18843@amdcad.AMD.COM> Lines: 66 Keywords: register windows, interrupt latency Summary: a possible expanation, but a wrong one In article <18843@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes: > [a comparison of load/store frequencies and non-zero offset frequencies] > Now, since MIPS didn't publish the input they used on these programs to > derive their numbers, we obviously cannot perform a direct comparison. Even if the input were the same, because the instruction count isn't necessarily the same for these two machines, the comparison of load/store frequencies isn't really relevant. In addition, as1 and am29k aren't even the same program! (There are several variants of nroff around, too.) > We assume, however, that the input wasn't "specialized" in any way. > One possible explination for this is that non-register-windowed machines > have many more loads/stores to the stack during procedure call > entry/exit, resulting in higher load/store percentages as well as higher > utilization of the base+offset addressing mode. Yes, a possible explanation, but wrong. When compiling nroff (a version released as part of UMIPS-BSD), using our inter-procedural register allocator, and an input file of 700 lines, 4670 words, and 30530 characters, we get the following dynamic statistics: 26.4% loads+stores 52.1 instructions per call Average registers saved per call: 1.6 Register save+restore: 23.3% of loads+stores (7.0% of instructions) (This percentage includes variables which must reside in memory to handle call by address parameters.) If you eliminated all these register save/restores, load+store frequency would be 20.9% (remember that the total instruction count changes too). Loads/stores to variables through global pointer register: 15.2% (Because of use of the global pointer register, these references are single instructions, all of which have non-zero offset values.) I would suspect that the Am29000 statistics do not count these as non-zero offset values, though they are more than half of all the references. Other load/stores: 4.2% (Zero-values offsets are frequent here within this sub-class, but are only about 3% of total instruction references.) > The main point to realize here is that a series of trade-offs are made > in any processor design, and what might be a big win for one processor > may have minimal impact on another. These individual tradeoffs cannot > be taken as "absolutes" without examining their relationship with the > rest of the architecture. Amen. Tim should also be careful, though, to realize that the tools and compilers used to measure the architecture also influences the results. Unless you set out to design both the architecture and the compiler concurrently, you won't be able to make good trade-offs between them. It should also be noted, again, that the selection of programs used as benchmarks will influence the results too. Making your architectural trade-offs on the basis of Dhrystone, or even nroff, which is not very representative of more modern C code, isn't very smart. I dare say, though, that trading the MIPS architecture for the MIPS architecture + windowed-registers - displacements - single-cycle load/stores, would be a loss. Regards, -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen