Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site lanl.ARPA Path: utzoo!linus!philabs!cmcl2!lanl!jlg From: jlg@lanl.ARPA Newsgroups: net.arch Subject: Re: RISC processors Message-ID: <16438@lanl.ARPA> Date: Mon, 19-Nov-84 22:35:28 EST Article-I.D.: lanl.16438 Posted: Mon Nov 19 22:35:28 1984 Date-Received: Thu, 22-Nov-84 06:21:23 EST References: <641@watdcsu.UUCP>, <267@idi.UUCP> <4640@utzoo.UUCP> Sender: newsreader@lanl.ARPA Organization: Los Alamos National Laboratory Lines: 87 It is obviously possible to build a RISC machine that is in the same class as a VAX. But why would you want to when some RISC-like machines have been running for years MUCH FASTER than a VAX? These are the CDC machines, the CRAY machines, and the more recent vector processor machines 'from the east.' For example, the CRAY machine is VERY RISC-like. There are two data addressing modes corresponding to the VAX 'literal mode' and the VAX 'displacement mode'. There are two branch addressing modes corresponding to the VAX 'literal mode' and the VAX 'register mode'. No instructions other than loads, stores, and branches address the memory. All the other instructions use 'register mode' for their operands, mostly three address code. Contrary to the remarks of previous submitters, there is no difficulty achieving very high speed floating point arithmetic on a RISC-like machine. In fact the floating point units on the CRAY-1s machine are just one clock slower than their integer counterparts. There are several differences between the CRAY machines and the RISC machines proposed by Peterson and others. The most important being the lack of orthogonality in the instruction set (although the CRAY-2 promises to fix this deficiency to some extent) and the lack of a high speed context switching mechanism. This last point is offset somewhat by the ability to 'block load' or 'block store' certain register sets (unfortunately, the present compilers don't make particularly good use of this feature). Another major difference between the two types of machines is the presence in the CRAY of several different functional units each with different timing characteristics. This requires extra logic to reserve registers until the operation is completed. So far I have described only the scaler part of the CRAY machine, and for good reason. Even without vector operations, the CRAY is MUCH faster than a VAX. I suspect that a VLSI version of the CRAY scaler instruction set would be able to outperform a VAX built with the same technology. The advantages of the reduced instruction set combined with the simpler memory interface (only two addressing modes with NO virtual memory support) would allow the 'micro CRAY' to be clocked at much higher rates. Of course, I doubt that the CRAY archetecture could be put on a single chip with todays technology, but it could probably be done with a small set of chips for each functional unit. Programming a RISC machine is simple as compared to CISC machine - far from being 'woefully inadequate' the RISC type of machine seems just right. In a CISC machine there are usually about half a dozen different ways of performing any given function, the most obvious is usually NOT the fastest, or even close. On a RISC machine, the most obvious code sequence is almost always the fastest - it may be the ONLY obvious code sequence. After 17 years of assembly coding I came to the conclusion the the CRAY instruction set was the easiest to use of any machine I have seen. And after two years of compiler maintenance on the CRAY I concluded that the instruction set was the easiest to write a compiler for as well (the CRAY compiler is such a poorly written thing that it would probably never have even worked on another machine). The only really difficult part is scheduling vector operations, which became much easier on the new X/MP machines. A word needs to be said about the lack of addressing modes and virtual memory. At the speeds at which RISC machines will run (not the demo units made from MOS but the real production chips that (I hope) will come out) memory will be the slowest component of the system. On the CRAY, only the reciprocal approximate is slower than a memory fetch, all other operations are at least twice as fast (integer add is 7 times as fast, logical operations are 14 times as fast). Staged memory is a help (several fetches or stores going simultaneously), but all the other functional units are staged as well. It makes sense to limit memory traffic to just loads and stores so that other functional units don't end up waiting for memory references. It also makes sense to limit the number of addressing modes so that memory traffic doesn't get even slower due to the extra checking and circuitry in the memory interface. If memory traffic is slow, then traffic to the secondary storage (disk or whatever) is REALLY SLOW. The data transfer rate for the standard CRAY drive (CDC DD-29) is 38.7x10^6 bits/sec, and the sector size is 512 words (64 bits/word); less than a millisecond per word - or about 68,000 cpu cycles!! This doesn't even count seek time, latency, or scheduling the traffic with the channel. Obviously, the operating system would have to suspend your task until the page had been loaded, and it is also clear that no ammount of 'lookahead' in the paging scheme could significantly improve the performance of the paging scheme. The solution is not to page, but to provide a very large amount of central memory. With large central memory, there is always enough room for code (it's small) but data may still need to be kept on secondary storage. Fortunately, it's usually possible to write code which anticipates its data needs and issues reads and writes (asynchronous of course) long in advance of the use of that data. Short of that, reads and writes don't do that much worse than paging would have done anyway. I'm looking forward to the first commercial RISC chips (or chip sets). I expect that to be competitive thay will have several functional units (each staged), only one or two addressing modes, a large central memory requirement, and no virtual addressing capability. With this combination, I think RISC could outrun any other small computer available.