Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site lanl.ARPA
Path: utzoo!linus!philabs!cmcl2!lanl!jlg
From: jlg@lanl.ARPA
Newsgroups: net.arch
Subject: Re: RISC processors
Message-ID: <16438@lanl.ARPA>
Date: Mon, 19-Nov-84 22:35:28 EST
Article-I.D.: lanl.16438
Posted: Mon Nov 19 22:35:28 1984
Date-Received: Thu, 22-Nov-84 06:21:23 EST
References: <641@watdcsu.UUCP>, <267@idi.UUCP> <4640@utzoo.UUCP>
Sender: newsreader@lanl.ARPA
Organization: Los Alamos National Laboratory
Lines: 87

It is obviously possible to build a RISC machine that is in the same class
as a VAX.  But why would you want to when some RISC-like machines have been
running for years MUCH FASTER than a VAX?  These are the CDC machines, the 
CRAY machines, and the more recent vector processor machines 'from the east.'

For example, the CRAY machine is VERY RISC-like.  There are two data addressing
modes corresponding to the VAX 'literal mode' and the VAX 'displacement mode'.
There are two branch addressing modes corresponding to the VAX 'literal mode'
and the VAX 'register mode'.  No instructions other than loads, stores, and
branches address the memory.  All the other instructions use 'register mode'
for their operands, mostly three address code.  Contrary to the remarks of
previous submitters, there is no difficulty achieving very high speed
floating point arithmetic on a RISC-like machine.  In fact the floating 
point units on the CRAY-1s machine are just one clock slower than their 
integer counterparts.

There are several differences between the CRAY machines and the RISC machines
proposed by Peterson and others.  The most important being the lack of 
orthogonality in the instruction set (although the CRAY-2 promises to fix
this deficiency to some extent) and the lack of a high speed context switching
mechanism.  This last point is offset somewhat by the ability to 'block load'
or 'block store' certain register sets (unfortunately, the present compilers
don't make particularly good use of this feature).  Another major difference
between the two types of machines is the presence in the CRAY of several 
different functional units each with different timing characteristics.  
This requires extra logic to reserve registers until the operation is 
completed.  

So far I have described only the scaler part of the CRAY machine, and 
for good reason.  Even without vector operations, the CRAY is MUCH faster
than a VAX.  I suspect that a VLSI version of the CRAY scaler instruction
set would be able to outperform a VAX built with the same technology.
The advantages of the reduced instruction set combined with the simpler
memory interface (only two addressing modes with NO virtual memory support)
would allow the 'micro CRAY' to be clocked at much higher rates.  Of 
course, I doubt that the CRAY archetecture could be put on a single chip
with todays technology, but it could probably be done with a small set
of chips for each functional unit.

Programming a RISC machine is simple as compared to CISC machine - far
from being 'woefully inadequate' the RISC type of machine seems just right.
In a CISC machine there are usually about half a dozen different ways of 
performing any given function, the most obvious is usually NOT the fastest,
or even close.  On a RISC machine, the most obvious code sequence is almost
always the fastest - it may be the ONLY obvious code sequence.  After 17 
years of assembly coding I came to the conclusion the the CRAY instruction
set was the easiest to use of any machine I have seen.  And after two years
of compiler maintenance on the CRAY I concluded that the instruction set
was the easiest to write a compiler for as well (the CRAY compiler is such
a poorly written thing that it would probably never have even worked on 
another machine).  The only really difficult part is scheduling vector
operations, which became much easier on the new X/MP machines.

A word needs to be said about the lack of addressing modes and virtual memory.
At the speeds at which RISC machines will run (not the demo units made from
MOS but the real production chips that (I hope) will come out) memory will
be the slowest component of the system.  On the CRAY, only the reciprocal
approximate is slower than a memory fetch, all other operations are at least
twice as fast (integer add is 7 times as fast, logical operations are 14 
times as fast).  Staged memory is a help (several fetches or stores going
simultaneously), but all the other functional units are staged as well. 
It makes sense to limit memory traffic to just loads and stores so that
other functional units don't end up waiting for memory references.  It also
makes sense to limit the number of addressing modes so that memory traffic
doesn't get even slower due to the extra checking and circuitry in the
memory interface.  If memory traffic is slow, then traffic to the secondary
storage (disk or whatever) is REALLY SLOW.  The data transfer rate for 
the standard CRAY drive (CDC DD-29) is 38.7x10^6 bits/sec,  and the sector
size is 512 words (64 bits/word); less than a millisecond per word - or
about 68,000 cpu cycles!!  This doesn't even count seek time, latency, or
scheduling the traffic with the channel.  Obviously, the operating system
would have to suspend your task until the page had been loaded, and it 
is also clear that no ammount of 'lookahead' in the paging scheme could
significantly improve the performance of the paging scheme.  The solution
is not to page, but to provide a very large amount of central memory.
With large central memory, there is always enough room for code (it's small)
but data may still need to be kept on secondary storage.  Fortunately, it's
usually possible to write code which anticipates its data needs and issues
reads and writes (asynchronous of course) long in advance of the use of that
data.  Short of that, reads and writes don't do that much worse than paging
would have done anyway.

I'm looking forward to the first commercial RISC chips (or chip sets).  I
expect that to be competitive thay will have several functional units (each
staged), only one or two addressing modes, a large central memory requirement,
and no virtual addressing capability.  With this combination, I think RISC
could outrun any other small computer available.