Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: 80486 vs. 68040 code size [really: how many regs] Message-ID: <950@aber-cs.UUCP> Date: 15 May 89 17:56:19 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 137 In article <19413@winchester.mips.COM> mash@mips.COM (John Mashey) writes: These are not special case assumptions, although, of course the example was put toghether to illustrate the point. 1) These days, many machines are reg-reg architectures. Ahhh. This is a ridiculous argument. Proof-by-numbers is dangerous... Can over 1 billion chinese be wrong in using abaci? :-) :-) 2) Optimizers do find this kind of stuff, especially in FORTRAN codes or some of the larger hunks of C. they don't have to be used heavily, they just have to be used enough to make it worthwhile. Again, on reg-reg machines. On Fortran, numeric code as you says. 3) I didn't realize the discussion was limited to reg-mem architectures: I recall the following part of a posting: [ ... ] Well, well. The discussion was not *limited*, it was in the context of reg-mem machines. In extending it to RISC machines (about which, as I said, there is even less data), you did something good. But please don't apply my statements to a 68020-386 debate to RISC machines. On these, my hunch is that more registers are OK. I did, although I'm sorry it wasn't more. It's not something we have much motivation to keep current, unlike the plethora of other statistics that we keep around. Should be good paper topic for somebody. Seconded. Might get a go myself... I have a 386 at home, and will have g++ on it soon. Who can give me a reg-reg machine for comparison :-) ? It IS important to account for other issues, and there's no reason that the answer should be the same for other architectural designs. I.e., if you end up with memory ops, rather than load-store, you're less motivated to have more registers [you'd think]. On the other hand, even there, you sometimes win because of cycle-count (not instruction-count) issues, i.e., the latency cycle(s) you get on most machines from fetching data from memory (even cache memory). Wise words. Very agreed. Still, the instruction count/code density issue has its weight in system performance, of course. You'd be surprised, especially in heavily-pipelined machines. You must be thinking of counting INSTRUCTIONS, not cycles: most fast (i.e., seriously pipelined) machines cost you a stall cycle if you want to fetch something from memory and use it right away, so even on a machine with mem->reg operations, you might choose to sometimes generate a load, followed by an op, because you might be able to rearrange code and get something in to cover the load latency. [people sometimes found this on the S/370s]. > Why don't you like inter-expression register assignments? >Well, I like them, as long as the compiler does not do them, but the >programmer does, by using explicit "register" declarations.... OPINION: the above statement sends me back to 15-20 years ago.... really, if you believe this, you are not keeping up with what's happening in the computer business. This is going back to the "volatile" debate, in the wrong newsgroup. I keep up, but occasionally I disagree, especially when none of the great and good over a period of months was able to quote figures to support their opinion. >Ahhhhhhhh. What you are saying is that you are using registers as a >statically allocated cache, and that this is good not because they are >frequently used, but because they would otherwise be frequently >saved/restored... Well, well, well. If you want a reg-reg architecture, you So far we seem to agree (give and take a few registers) that the issue requires more research, and that C on CISC is more favourable (admittedly) than FORTRAN on RISC, etc... But here is something very interesting (sorry for quoting so much, I'd have summarized, but I have become wary of that): No. The registers are frequently used. I said the issue was subtle. In a leaf routine, (on an R3000, but also, very similar on others') 1) One need not save/restore the return address 2) Most (or usually) all of the local variables get grabbed into scratch registers that need not be saved. 3) Now, the stack frame has evaporated, and so we need not move the stack pointer around, and we already usually didn't have a frame pointer. This is a very good argument, so far, for an AMD 29k style very large register file, that becomes a statically managed first level memory, or for a SPARC style set of (less statically managed) windows. When the register file is very large, you are really almost dealing with a machine with fast and slow stores, onw of which is addressed e.g. with 8 bit word addresses, and the other with 32 bit byte addresses. The rules change dramatically. It looks like old CYBERs (even the problem of swapping in/out the fast memory on context switches). I happen to like the MIPS precisely because it has NOT gone this route. I also like the Transput because it has taken this route, but seriously (4kbyte onchip fast memory -- "registers" if you prefer). Since leaf routines are often about 50% of the dynamic function calls, this is relevant, and a similar, albeit less strong effect happens on others. Having plenty of scratch registers also means you can pass a reasonable number of arguments in registers, avoiding doing stores in the caller, and loads in the callee. The point is, that a lot of load/store traffic around function calls disappears if you have enough registers and smart compilers (whether or not you have windows, which of course, can get rid of a few more). Fast machines hate loads, because they usually cost you stall cycles. pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero >address architecture with 8/12 bit instructions, and four (to avoid extra >push/pop pairs in multiplexing a single one for the up to four independent >computations) arith stacks. This is a fine OPINION; the current round of new computer architectures has voted widely, and decisively, for load-store machines with "plenty" of registers addressable at any point in the program. (plenty = usually 32, as in HP PA, MIPS, SPARC, MC88000, i860). Again, proof-by-numbers. The current round of computer users, it could be said, have voted decisively for segmented CISC architectures :-( :-(. Also, it is not entirely an opinion; the only novelty I am citing is having multiple arith stacks, but, while you say: In particular, although I've always admired the old B5500, it seems that zero-address architectures are difficult to build to really go fast... there is the little matter of a few FACTs, called CRISP, NOVIX and TRANSPUTER, that seem to be always forgotten (not to mention the 32532, which has the extremely embarassing property of being a simple, well designed, reg-mem CISC that outruns most RISCs around...) by reg-reg and otherwise RISC designs. Declaration of prejudice: I am all (well, 80% :->) for RISC. Of these I find MIPS more admirable than most. The idea of simple, fast, reliable, is what I like. It is obvious that I disagree from my armchair that the benefits of RISC are there because of ALL the design choices of most RISCs. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk