Path: utzoo!attcan!uunet!ncrlnk!ncr-sd!hp-sdd!hplabs!amdcad!crackle!tim
From: tim@crackle.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Register Windows (was Re: Japanese...)
Message-ID: <23155@amdcad.AMD.COM>
Date: 7 Oct 88 16:46:48 GMT
References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> <ANDREW.88Sep28160417@jung.ha <9725@cup.portal.com> <9410@pur-ee.UUCP>
Sender: news@amdcad.AMD.COM
Reply-To: tim@crackle.amd.com (Tim Olson)
Organization: Advanced Micro Devices, Inc. Sunnyvale CA
Lines: 46
Summary:
Expires:
Sender:
Followup-To:

In article <9410@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
| Let's say you have 8 non-lazy windows (and one heck of a lot of valuable die
| space consumed by them).  What do you do when the 9th nested call is made?
| The 10th?  You do a sit-and-wait-for-it burst store, that's what...  would
| you really describe that as being "*without any* memory references at all!"

True, but then non-lazy stores are only loading or storing what is absolutely
required, when it is required.  Lazy operations are continually trying
to load/store ahead. This seems like the real misnomer -- non-lazy
windows are truely lazy (only doing what is required) vs "lazy windows"
(which are quite active).

Consider an "on-demand" load/store window scheme with 4 windows, vs.  a
"background" load/store window scheme with 2 windows (which was implied
to be all that was required).  If the call chain looks like

1 2 3 4 5 6 7 6 7 6 7 6 7 6 7 8 7 6 5 6

i.e.  spends a lot of around a local maximum depth (certainly not
atypical).  The "on-demand" window scheme has no saving or restoring to
do while bouncing around between levels 7 and 6 because of the built-in
hysterisis.  The "background" load/store scheme, however, is
continually saving and restoring.  This is a waste of memory bandwidth. 
What register windows are buying is this hysterisis in saving and
restoring the stack frame, and the only way to get it is to provide a
large number of windows. Once that is available, background load/stores
don't buy much, because register file spilling/filling just doesn't
occur that often in real programs (maybe 0.5% of all calls spill)

One other problem with "background" load/stores occurs when memory
operations take more than a single cycle.  In this case, a "background"
memory operation may be started when the memory was otherwise idle, and
right after that, a regular load or store is requested.  The requested
operation must wait for the background one to complete, decreasing
performance (this is what I think Brian Case was talking about when he
mentioned having to predict future operations -- to ensure that this
kind of collision doesn't occur).

Finally, if background loads/stores are interspersed with the regular
load/store stream, they cannot take full advantage of their sequential
nature, and thus cannot take full advantage of any faster burst-mode
capability that memory may provide. 

	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)