Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!seismo!rochester!ritcv!cci632!rb From: rb@cci632.UUCP (Rex Ballard) Newsgroups: net.arch Subject: What RISC is REALLY all about! Message-ID: <193@cci632.UUCP> Date: Mon, 7-Jul-86 20:41:16 EDT Article-I.D.: cci632.193 Posted: Mon Jul 7 20:41:16 1986 Date-Received: Fri, 11-Jul-86 04:43:18 EDT References: <475@elmgate.UUCP> <789@petsd.UUCP> <481@elmgate.UUCP> Reply-To: rb@ccird1.UUCP (Rex Ballard) Organization: CCI, Rochester Development, Rochester, NY Lines: 104 Summary: parallelism, primitives. In article <481@elmgate.UUCP> jdg@elmgate.UUCP (Jeff Gortatowsky) writes: > >That's not what I meant. I was just citing an example. The same DBcc >instruction could, of course be 3 separate CISC instructions as well, using >32 bit quanities. If the CISC vs. RISC subject was the same as saying "the >68000 vs. RISC", I might better understand the hoopla associated with RISC. >Indeed most of the talk is comparing less well known mini or mainframe >computers, not micros. One exception to this is the VAX. It's classified >as a mini (CISC) but is, of course, very well known. Sure, RISC is not new, it has existed in larger archetectures for some time, and provides a good source of background. >In short I was not asking what is wrong with the 68000 CPU's. Just whether >RISC is REALLY an inprovement in computer design? If so why? Yes and no. This is like asking if an ALL ROM operating system is better than an all RAM operating system. There are some tradeoffs. Remember the early Apple IIs, TRS-80's, and Ataris? These boxes, among others had a large ROM operating system, command language, utilities, and drivers. This was primarily because these early boxes were cassette based systems. Now, with floppies and hard disks as part of the "minimum package", it is less desirable to have the OS rommed in. Even those who do have heavy rom tend to use them as a library of primitives, rather than as the "final system interface". These same factors are now becoming significant at the CPU level. Ram cache is major factor, as are factors such as pipelining. When it is possible to get an order of magnitude faster "local storage" even in small quantities, the costs and benefits become worth considering. As you pointed out earlier, much of the RISC archetecture comes right off the old minis and mainframes. Anybody remember when a computer was really powerful if it had a 4K "core" (magnetic beads), a large "drum", and a tape drive? Just as those old machines had special "controllers" which managed, loaded, and stored data to/from these successively slower media, the RISC chips tend to have sub-chip level "controllers" of their own. Remember, CPUs and "controllers" are just different flavors of finite state machines. Fortunately, most of the mechanics has been hidden using modern hardware and software techniques. Again, mostly taken from minis and mainframes. >It was always >my feelings that, if a CPU manufacturer were to write the language compilers >first, THEN generate a CPU design to run it, we'd all be alot happier. Yes, >that sounds like a CISC design. But, am I wrong saying that INMOS took >that approach with the TRANSPUTER? Am I wrong in assuming the TRANSPUTER >is a RISC CPU? If I'm not wrong then RISC vs CISC seemings like a useless >argument, as INMOS' product proves that the CPU should fit the langauge it >runs, not the other way around. CISC or RISC notwithstanding. One of the nice features of the TRANSPUTER is that the primitives used in OCCAM could be used in other languages. In addition, other primitives could be added, changed, deleted, to make a super-fast 'C' machine, or a forth machine, or prolog, or smalltalk, or lisp, or ??. In some ways, RISC is a possible LOSE. You have been looking at only the functionality of a single instruction vs. three instructions. In some cases such as the "frame save" or "poly" or "context switch" operations, it may even be necessary to add the overhead of a "call" instruction, but the call instruction can also be easily and quickly optimized. Most programmers today are "top down" trained, and not used to thinking in terms of primitives. RISC however, makes thinking in terms of primitives, not only at the compiler level but at the project level, even more significant. Here is where one begins to see the advantages of RISC. When it becomes possible to pack more and more "primitives" into the system, and have them automatically arrange themselves to the most efficient configuration within a few hundred cycles. Things start to speed up at the application level. Ironically, your DBcc example is interesting. What would you put inside the loop? Suppose you could use 16 or 32 bytes of instructions inside the loop. Suppose in addition, that you wanted to loop 2000 times. Now in a CISC, you might want to put the DBcc in "ROM", but how about strncpy, and the rest of libc.a? Now since you only need at most a few hundred bytes, you can afford to buy and use very FAST ram. You could build a little 4K cache and have about 5 layers deep worth of nesting, without ever seeing (slowing down CPU with wait cycles) for a cache miss, because the "pre-fetcher" is loading the next subroutine while your inside a "sister" subroutine's loop. Not only that, but you've got more time to prefetch from slower memory, because you are in a loop. If you want to watch the RISC system get a little crazy, get a massively large routine of say, 6K inside one massive loop, loaded with "macro expansions" , and watch performance drop through the floor :-). CISC on the other hand, can save you a few FETCH cycles. Of course, there is nothing to stop you from putting these RISC features in a CISC, except time, money, chip and board space, effort co-ordinating arbitrary length instructions with co-processors, extra delays for microcode synchronization, data-path turn-around, microbus arbitration, external bus arbitration,..... :-). Seriously, there may be features of CISC that will need to be incorporated into RISC. But as these features are added, they will probably be done via hardware, rather than micro-code. Things like address calculations, multiply and divide, TLB changes, and pre-fetchers may end up becoming smarter as individual units.