Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site mips.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!whuxlm!harpo!decvax!decwrl!Glacier!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: net.arch Subject: Re: Stack architectures - why not? Message-ID: <188@mips.UUCP> Date: Fri, 13-Sep-85 05:03:34 EDT Article-I.D.: mips.188 Posted: Fri Sep 13 05:03:34 1985 Date-Received: Sat, 14-Sep-85 17:07:25 EDT References: <796@kuling.UUCP> <172@myriasa.UUCP> <1094@ulysses.UUCP> Organization: MIPS Computer Systems, Mountain View, CA Lines: 86 teve Bellovin writes, in reply to Chris Gray: > > I've been told by a couple of people who are normally well informed that > > a pure stack architecture just isn't practical. They have NOT been able > > to convince me of this. Anybody out there want to try? > > (bunch of comments, which seem pretty good ones) > My conclusion: the right answer, at least for now, is a machine with a good > subroutine stack. Other issues, notably the complexity of the instruction > set, are open. > In support of Steve's position are the following additional ones. As always, it is VERY hard to analyze architectural features in isolation, i.e., all generalizations are false; nevertheless: 1) COSTS IN FUNDAMENTAL DATA ACCESS TIMES. For a given level of technology, it always seems faster to do add reg1,reg2,reg3 where this means a) select values of reg1 and (if dual-ported reg file, at same time), reg2, b) add them c) gate result back into reg3. Rather than [assuming A is TOS, B is TOS+1] add where this means (as in B5500, for example): a) make sure both A & B are valid; if not, make 1-2 memory fetches, and put them into A&B b) add them, putting result back in B c) mark A invalid. Less registers = more memory traffic = faster access to registers at lowest hardware level. More registers = less memory traffice = slower access to the registers, because either a) the registers not only have to act like registers, but must also act like a giant shift register to keep the TOS at a definite place, which (at chip level, anyway) gobbles realestate or b) one needs an index register (related to the stack pointer) which points to the TOS location within the register array. This turns out to be painful for the basic machine cycle, because it requires some extra decoding time to find the TOS and the TOS+1 - unless there's great trickery somewhere, I suspect there's an extra adder step required somewhere, which is real ungood in the basic machine cycle. You may note that most machines that have multiple register sets allocate them in sets of powers of 2, so that reigster selection can occur by concatenating the register number requested with high-order bits that indicate which register set is used. Allowing variable-size register windows is possible, but much harder. 2) PIPELINING PROBLEMS [this piece I'm less sure of] At a given level of technology, one way to make things go faster is pipelining, or overlapping instruction execution. Faster machines tend to use more pipeline stages (not just IFETCH & EXECUTE, for example). AMong other things, this requires complex "bypassing", whereby the results of one operation may dynamically feed into the next, because the next has already started well before the first finishes. In general, this is easiest to do for very simple architectures, i.e., like CDC or CRAY machines, which are load/store architectures with little or no complex side-effects and exciting address modes. The more complex the architecture, the more complex becomes the detection and handling of pipeline hazards; the more complex, the slower. Recall the number of oddities that have popped over the years on machines with heavy use of side-effects (like auto-increment addressing), especially in the presence of memory protection errors; stack machines are like those, but with auto-increment/decrement on almost every instruction! ALthough some of the original technology arguments have disappeared, it is worth noting that: a) Although many people have been able to cost-reduce architectures over the years, it seems that the Burroughs architectures have been difficult to move unchanged to lower price levels, and at upper performance levels, they've tended to go to multi-processors. There may, of course, be other reasons for the latter, and Burroughs has been in MP for a long time, but it is often the case that you do that when the technology is hard to make go much faster in uni-processors. Current dyadic IBM CPUs are similar example. [Not a criticism, just an observation; I always admired the B5500 and its friends for the vision shown therein.] b) One may conjecture why HP is replacing the (stack machine) HP-3000 with Spectrum (to all accounts, RISC architecture of load/store variety) for more performance. BOTTOM LINE: stack machines are elegant ijn some ways, but very hard to make either really cheap or really fast. MAYBE current VLSI technology can overcome some of this, but it's not at all clear. -- -john mashey UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash DDD: 415-960-1200 USPS: MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043