Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!sun-barr!newstop!sun!amdcad!nucleus!tim From: tim@nucleus.amd.com (Tim Olson) Newsgroups: comp.lang.c Subject: Re: Longjmping back, and back again; Coroutines in C Message-ID: <28325@amdcad.AMD.COM> Date: 9 Dec 89 01:17:34 GMT References: <457@enea.se> <1989Nov21.120938.9200@psuvax1.cs.psu.edu> <20873@unix.cis.pitt.edu> <576@kunivv1.sci.kun.nl> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 71 Summary: Expires: Sender: Followup-To: In article <576@kunivv1.sci.kun.nl> ge@kunivv1.sci.kun.nl (Ge' Weijers) writes: | kenmoore@unix.cis.pitt.edu (Kenneth L Moore) writes: | | >Yup. This is state of the art computater (sic) architecture. This idea | >arose along with RISC (Reduced Instruction Set Computer) but can be used | >on RISC or CISC (Complex Instruction Set Computer) machines. | | And a sorry state it is. I'd rather have twice the number of registers, | a store-multiple-registers instruction, and NO register windows. | It's just as fast, because you mostly save registers that contain | garbage. The SPARC was designed with the assumption in mind that nobody | uses recursion anyway. This might be true for C programmers. You should check out the Am29000. It has 192 visible registers, load-multiple and store-multiple instructions, and the register windowing scheme is implemented in software, so that the allocated register window size is exactly tailored to the frame size required by the function. | >Remember that RISC is the result of a statistical study that showed that | >99% of the instructions used on a CISC machine were a sub-set of some 30 | >(out of maybe 256? on an IBM 360) commands. Also, 10 instructions | >accounted for 80% and 21 accounted for 95%. | | This has nothing to to with register windows. True. | >Another aspect that became apparent during these studies was that much | >of the overhead in a processor was consumed in keeping track of | >subroutine calls and returns. | | Recent studies have also shown that you can do the same in software, | using a simple analysis. It isn't simple. You need inter-procedural register allocation ("universal"), and even that is a static analysis -- it doesn't take into account the dynamic nature of the call tree at runtime, like hardware register windows do. | >The original RISC-I guys had a lot of left over chip area and decided | >that the thing to do with the extra area was to make extra registers. | >And they shrewdly decided to use the registers to facilitate subroutine | >handling. | | It was a good idea as an experiment. It's a pain if your recursion goes | 500 deep. But because the SPARC has no other way to save a lot of | registers in reasonable time, you're stuck with it. | It also introduces quite a bit of overhead when the OS has to switch | contexts. Actually, recursion isn't the problem -- it's large, quick changes in the stack depth. In fact, many times recursive routines are better on register-windowed machines, because they only have to save some cached portion of the stack if they overflow the cache. Consider a dynamic call chain (of a recursive routine) which looks like: 1 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 a non-register-windowed machine must save and restore registers on each function call and return, while the register-windowed machine could run without any register state saved or restored to/from memory. -- Tim Olson Advanced Micro Devices (tim@amd.com)