Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!amdcad!crackle!tim From: tim@crackle.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Register Windows (was Re: Japanese...) Message-ID: <23150@amdcad.AMD.COM> Date: 6 Oct 88 23:21:48 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> <287@granite.dec.com> Sender: news@amdcad.AMD.COM Reply-To: tim@crackle.amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc. Sunnyvale CA Lines: 95 Summary: Expires: Sender: Followup-To: In article <287@granite.dec.com> jmd@granite.UUCP (John Danskin) writes: | By the way: | There is a paper: | "Register Windows Vs. General Registers: A Comparison of | Memory Access Patterns" by Scott Morrison and Nancy Walker | of UC Berkeley. | | Which shows that the MIPS R2000 (aside from running faster) achieves | fewer memory references (in almost all cases) than SPARC with all | levels of optimization and as many as 7 register windows. This says fewer overall memory references, but what is missing here is the ratio of loads and stores to the rest of the instruction mix. I wouldn't be surprised if it is just that the Sun compiler is not doing as good a job in general, and so the total number of instructions (including loads and stores) increased with respect to the MIPS compiler. However, the number of loads and stores as a percentage of the instruction mix might be lower. | a) Does anyone know if/where (Earl?) this paper was published? | (I got a copy from MIPS people, they love to give it away). | | b) Does anybody at SUN have an answer (tell us how they got it all | wrong, register windows really DO save memory references). | | c) Anybody at AMD (Tim?) want to say something about how burst | read/write makes the extra references OK? Well, I don't know what SUN seems to be doing wrong, but let's try this: bsd 4.3 nroff with the 4.3 libraries running nroff /usr/doc/misc/sysperf/2.t [a 10655 byte file] results in: ---------- Pipeline ---------- 32.63% idle pipeline: 18.39% Instruction Fetch Wait 11.44% Data Transaction Wait 0.69% Page Boundary Crossing Fetch Wait 0.01% Unfilled Cache Fetch Wait 0.00% Load/Store Multiple Executing <-- Hmm, not much time here! 2.07% Load/Load Transaction Wait 0.03% Pipeline Latency ---------- Bus Utilization ---------- Inst Bus Utilization: 63.97% 8669133 Instruction Fetches Data Bus Utilization: 9.75% 979830 Loads 340998 Stores ---------- Instruction Mix ---------- 1.86% Calls 15.65% Jumps 10.73% Loads 3.74% Stores 4.33% No-ops ---------- Register File Spilling/Filling ---------- 3 Spills <-- this is why 0 Fills Spill/Fill sizes: 1 registers: 0 time(s) ( 0.00%) 2 registers: 1 time(s) ( 33.33%) 3 registers: 0 time(s) ( 0.00%) 4 registers: 1 time(s) ( 33.33%) 5 registers: 0 time(s) ( 0.00%) 6 registers: 0 time(s) ( 0.00%) 7 registers: 0 time(s) ( 0.00%) 8 registers: 0 time(s) ( 0.00%) 9 registers: 0 time(s) ( 0.00%) 10 registers: 0 time(s) ( 0.00%) 11 registers: 0 time(s) ( 0.00%) 12 registers: 1 time(s) ( 33.33%) 13 registers: 0 time(s) ( 0.00%) 14 registers: 0 time(s) ( 0.00%) 15 registers: 0 time(s) ( 0.00%) 16 registers: 0 time(s) ( 0.00%) > 16 registers: 0 time(s) ( 0.00%) So for the entire nroff run, we wrote a total of 18 words out to memory due to stack cache overflow. And we loaded 0 words from the stack due to underflow (this is because nroff exits() while it is still a few procedures down in the overall call chain). I would be interested in seeing how this compares to a non-register-windowed processor, in particular the total number of loads/stores, the loads/stores as a percentage of instruction mix, and the number of words of scalar data transfered to/from the stack. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)