Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!bloom-beacon!apple!rutgers!bellcore!texbell!killer!elg
From: elg@killer.Dallas.TX.US (Eric Green)
Newsgroups: comp.arch
Subject: Re: 80486 vs. 68040 code size [really: how many regs]
Message-ID: <8082@killer.Dallas.TX.US>
Date: 12 May 89 01:51:53 GMT
References: <927@aber-cs.UUCP>
Distribution: eunet,world
Organization: The Unix(R) Connection, Dallas, Texas
Lines: 105

in article <927@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) says:
> In article <19063@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>>     
>>     The simplicity of source statements has little to do with the number of
>>     registers desirable, unless the only compiler your have generates code
>>     on a statement-by-statement basis only, i.e., no optimization.
>> 					    ^^^^^^^^^^^^^^^^^^^^^
>> Optimization is not just (and maybe even not most importantly) inter
>> statement...
>> 
>>     For example, consider a typical RISC (i.e., load/store), and the C stmts:
>>     	a = b + 5;
>>     	c = b + 7;
>> 	[ .... ]
>> 
> Your example works, but under special case assumptions: that you are working on
> a reg-reg architecture, whereas we were discussing reg-mem ones; that putting
> all three a,b,c in registers is worthwhile because they are going to be used
> heavvily in other parts of the program.

I program a 68000 a lot. I suspect a 68000 is a fairly typical
reg-memory machine. I write a lot of "C" code. The "C" compiler I'm
using is fairly PCC-like, i.e. loses all register values between
expressions. In performance-critical portions of the code, I end up
modifying the assembly language output to keep as many values in
registers as possible. This is especially important with globals,
because they can't be put into registers normally.  Presto, fewer
instruction bytes, as much as 20% improvement in performance. If only
the compiler did it for me, eh?

But wait -- I've used such a machine -- a Pyramid 90x. It has a fairly
state-of-the-art compiler with a global optimizer, and basically
ignores "register" declarations. When I try to go into the assembly
code there and speed it up, guess what? I can't improve the register
allocation one iota. 

Anecedotal evidence, certainly. But good enough for me to conclude
that having lots of registers and a good global optimizer is a Good
Thing. People who disagree with that must have never looked at the
assembler-language output of their code on different machines....

> used little. In a reg-mem architecture little use variables in memory do not
> carry costs as high when you use them.

Foo. If you use the variable three times, you've saved 4 memory
fetches (2 addresses, 2 data) as vs. keeping it in a register. No matter
what kind of machine you're using.


>>     Why don't you like inter-expression register assignments?
> Well, I like them, as long as the compiler does not do them, but the
> programmer does, by using explicit "register" declarations. But

Have you ever looked at the output of a "C" compiler, and compared the
output of GCC to PCC (i.e. optimizing compiler vs. non-optimizing)?
Re-compiling a program with GCC is a sure-fired way of speeding it up
by 20% ;-).

> pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero
> address architecture with 8/12 bit instructions, and four (to avoid extra
> push/pop pairs in multiplexing a single one for the up to four independent
> computations) arith stacks.

Uh, excuse me, have you ever read the various RISC papers? Reaching
over to my handy boxfull of 6x5 cards.... urk, can't find the one I
wanted, the 1986 David Patterson overview in CACM, but it should be
easy enough to find. I assume you know how to use the indexes in your
library? Look up a few RISC papers, then come back. Then you'll be
able to argue convincingly  about the various merits of register-register vs.
register-memory vs. stack models (hints: pipelining, locality of
reference >80% for code memory, cache size, program-fetch bandwidth
vs. data fetch bandwidth, ...). If you have something to say that
wasn't said in the latest CISC/RISC wars, I'm sure everybody would
appreciate hearing it -- but a word of warning, not much was left
UNSAID.

(the biggest argument of the RISC guys is that, because of locality of
reference program-wise, program memory bandwidth is almost
unlimited... it's data-memory bandwidth that's now the main limit,
which is why they want lots of registers and mostly
register-to-register operations. CISC folks, of course, say that those
qualities certainly aren't restricted only to RISC.... but, anyhow,
your reg-mem architecture is blown all to heaven by what the RISC
folks have actually PRODUCED IN SILICON, i.e. it is NOT the bemused
speculations of a goggle-eyed grad student).

> Note that It is assumed that RISC == reg-reg, and that load-store == reg-reg;
> neither these equations are necessarily true, as one could have RISC ==
> stack-stack or load-store == stack-stack... 

stack-stack must be reg-reg in order to be adequately fast. AT&T had a
novel architecture some years back called CRISP, which, if I recall,
was a stack-oriented RISC machine. AT&T eventually decided to use
SPARC for their RISC processor instead, in the event they built a
RISC-based computer... I don't really know what became of all that,
alas. It's possible to organize registers in a number of different,
novel ways. stack-stack saves on program-memory bandwidth, at the cost
of reducing flexibility of register use. But note that program-memory
bandwidth is the one thing there's no shortage of.

--
|    // Eric Lee Green              P.O. Box 92191, Lafayette, LA 70509     |
|   //  ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg     (318)989-9849     |
|  //    Join the Church of HAL, and worship at the altar of all computers  |
|\X/   with three-letter names (e.g. IBM and DEC). White lab coats optional.|