Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!caip!seismo!rochester!ritcv!cci632!rb
From: rb@cci632.UUCP (Rex Ballard)
Newsgroups: net.arch
Subject: What RISC is REALLY all about!
Message-ID: <193@cci632.UUCP>
Date: Mon, 7-Jul-86 20:41:16 EDT
Article-I.D.: cci632.193
Posted: Mon Jul  7 20:41:16 1986
Date-Received: Fri, 11-Jul-86 04:43:18 EDT
References: <475@elmgate.UUCP> <789@petsd.UUCP> <481@elmgate.UUCP>
Reply-To: rb@ccird1.UUCP (Rex Ballard)
Organization: CCI, Rochester Development, Rochester, NY
Lines: 104
Summary: parallelism, primitives.

In article <481@elmgate.UUCP> jdg@elmgate.UUCP (Jeff Gortatowsky) writes:
>
>That's not what I meant.  I was just citing an example.  The same DBcc
>instruction could, of course be 3 separate CISC instructions as well, using
>32 bit quanities. If the CISC vs. RISC subject was the same as saying "the 
>68000 vs. RISC", I might better understand the hoopla associated with RISC. 
>Indeed most of the talk is comparing less well known mini or mainframe
>computers, not micros. One exception to this is the VAX.  It's classified
>as a mini (CISC) but is, of course, very well known.

Sure, RISC is not new, it has existed in larger archetectures for some
time, and provides a good source of background.

>In short I was not asking what is wrong with the 68000 CPU's.  Just whether
>RISC is REALLY an inprovement in computer design?  If so why?

Yes and no.  This is like asking if an ALL ROM operating system is better
than an all RAM operating system.  There are some tradeoffs.

Remember the early Apple IIs, TRS-80's, and Ataris?  These boxes, among
others had a large ROM operating system, command language, utilities, and
drivers.  This was primarily because these early boxes were cassette based
systems.

Now, with floppies and hard disks as part of the "minimum package", it is
less desirable to have the OS rommed in.  Even those who do have heavy rom
tend to use them as a library of primitives, rather than as the "final
system interface".

These same factors are now becoming significant at the CPU level.  Ram
cache is major factor, as are factors such as pipelining.  When it is
possible to get an order of magnitude faster "local storage" even in small
quantities, the costs and benefits become worth considering.

As you pointed out earlier, much of the RISC archetecture comes right
off the old minis and mainframes.  Anybody remember when a computer was
really powerful if it had a 4K "core" (magnetic beads), a large "drum",
and a tape drive?

Just as those old machines had special "controllers" which managed, loaded,
and stored data to/from these successively slower media, the RISC chips
tend to have sub-chip level "controllers" of their own.  Remember, CPUs
and "controllers" are just different flavors of finite state machines.

Fortunately, most of the mechanics has been hidden using modern hardware
and software techniques.  Again, mostly taken from minis and mainframes.

>It was always
>my feelings that, if a CPU manufacturer were to write the language compilers
>first, THEN generate a CPU design to run it, we'd all be alot happier.  Yes,
>that sounds like a CISC design. But, am I wrong saying that INMOS took 
>that approach with the TRANSPUTER?  Am I wrong in assuming the TRANSPUTER 
>is a RISC CPU?  If I'm not wrong then RISC vs CISC seemings like a useless 
>argument, as INMOS' product proves that the CPU should fit the langauge it
>runs, not the other way around. CISC or RISC notwithstanding.

One of the nice features of the TRANSPUTER is that the primitives used in
OCCAM could be used in other languages.  In addition, other primitives could
be added, changed, deleted, to make a super-fast 'C' machine, or a forth
machine, or prolog, or smalltalk, or lisp, or ??.

In some ways, RISC is a possible LOSE.  You have been looking at only the
functionality of a single instruction vs. three instructions.  In some
cases such as the "frame save" or "poly" or "context switch" operations,
it may even be necessary to add the overhead of a "call" instruction,
but the call instruction can also be easily and quickly optimized.

Most programmers today are "top down" trained, and not used to thinking
in terms of primitives.  RISC however, makes thinking in terms of primitives,
not only at the compiler level but at the project level, even more significant.

Here is where one begins to see the advantages of RISC.  When it becomes
possible to pack more and more "primitives" into the system, and have them
automatically arrange themselves to the most efficient configuration within
a few hundred cycles.  Things start to speed up at the application level.

Ironically, your DBcc example is interesting.  What would you put inside
the loop?  Suppose you could use 16 or 32 bytes of instructions inside the
loop.  Suppose in addition, that you wanted to loop 2000 times.  Now in
a CISC, you might want to put the DBcc in "ROM", but how about strncpy,
and the rest of libc.a?  Now since you only need at most a few hundred
bytes, you can afford to buy and use very FAST ram.  You could build
a little 4K cache and have about 5 layers deep worth of nesting, without
ever seeing (slowing down CPU with wait cycles) for a cache miss, because the
"pre-fetcher" is loading the next subroutine while your inside a 
"sister" subroutine's loop.  Not only that, but you've got more
time to prefetch from slower memory, because you are in a loop.

If you want to watch the RISC system get a little crazy, get a massively
large routine of say, 6K inside one massive loop, loaded with "macro
expansions" , and watch performance drop through the floor :-).

CISC on the other hand, can save you a few FETCH cycles.  Of course,
there is nothing to stop you from putting these RISC features in a
CISC, except time, money, chip and board space, effort co-ordinating
arbitrary length instructions with co-processors, extra delays for
microcode synchronization, data-path turn-around, microbus arbitration,
external bus arbitration,..... :-).

Seriously, there may be features of CISC that will need to be incorporated
into RISC.  But as these features are added, they will probably be done
via hardware, rather than micro-code.  Things like address calculations,
multiply and divide, TLB changes, and pre-fetchers may end up becoming
smarter as individual units.