Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!iuvax!pur-ee!uiucdcs!uxc.cso.uiuc.edu!ccvaxa!aglew From: aglew@ccvaxa.UUCP Newsgroups: comp.arch Subject: Re: What should be in hardware but isn' Message-ID: <28200048@ccvaxa> Date: Wed, 23-Sep-87 11:11:00 EDT Article-I.D.: ccvaxa.28200048 Posted: Wed Sep 23 11:11:00 1987 Date-Received: Sat, 26-Sep-87 14:54:51 EDT References: <581@l.cc.purdue.edu> Lines: 101 Nf-ID: #R:l.cc.purdue.edu:581:ccvaxa:28200048:000:5598 Nf-From: ccvaxa.UUCP!aglew Sep 23 10:11:00 1987 | cik@l.cc.purdue.edu (Herman Rubin) | There are many instructions which are easy to implement in hardware, but | for which software implementation may even be so costly that a procedure | using the instruction may be worthless. Some of these instructions have | been implemented in the past and have died because the ill-designed | languages do not even recognize their existence. Others have not been | included due to the non-recognition of them by the so-called experts and | by the stupid attitude that something should not be implemented unless | 99.99% of the users of the machine should be able to want the instruction | _now_. As you can tell from this article, I consider the present CISC | computers to be RISCy. Well, I'm defintely a member of the RISC camp, but I do think that Herman has raised two good points: (1) there are instructions that are not in the 99% usage group that can still be more efficiently implemented in hardware than software. (2) languages should provide better (more efficient) ways of interfacing to machine features, without the expense of wrapping them in procedure call overhead. (1) Remember, frequency of use is not the fundamental criterion for inclusion of an instruction. It's more like (Number of times that the operation is required) * (Speed of software implementation)/(speed of hardware implementation) /(slowdown of other instructions...) With (customers who simply want a benchmark that can make good use of that particular instruction) factored in. Of course, these are highly nonlinear functions, and frequency of use is often a good approximation, other things being equal. But other things are not always equal. "Unusual" instructions do not always mean microcode; if they can be combinatorically implemented, especially without state machines, then there may be no slowdown apart from wiring effects. And they may be extremely slow to do in software. One of my favorite examples of this is the bit-reversed indexing that is so convenient for FFT applications. Bit reversing an 8 or 16 bit address isn't too bad, but some applications are getting into the 32 bit range, and may be passing that soon. Even with table lookup in decent sized tables, bit reversal is expensive in software - and yet it can be easily done in hardware. If FFTs are a primary application for your system it may be worthwhile looking at bit reversal or carry-reversed addition. Even if they aren't, the instruction may be so cheap to implement that it may be worth including - because it may turn out to be important in the future. Of course, there is a designer's trap here: each additional instruction may appear cheap, but the cost of a horde of extra instructions may exceed the sum of them each because you have exceeded a hard limit of your implementation, like chip size. I try to avoid instructions that require sequencing, but occasionally amuse myself by thinking of purely combinatoric operations that might be useful, and cheap to implement. Like bit reversal, or counting the number of set bits in an instruction, or branching (ooops) based on the uppermost three bits. Unfortunately, there aren't too many. (2) The second point, about interfacing to high level languages, is more important. Say that I do have a code that needs to count the number of set bits in a bitstring, and it already uses a POP function. Well, I can simply replace the POP function by my POP instruction, can't I? Unfortunately, most $%^^#@!!! languages do not let you; you either have to wrap the POP instruction in a function call, which loses most of it's benefit, or you have to use asms after putting stuff into register variables that you *KNOW* the compiler maps to R7 and R6... Not good either way. It would be nice if the compiler could be made to know about every instruction in the machine, even though it didn't generate code for them; it would be nice if you could say count = asm_pop(bitreg) and the compiler could say "Oh, he wants to use the pop instruction that this strange machine provides. I don't know how to use it myself, but I know how to arrange for the programmer to use it. Let's see, it takes an input in a register and puts an output into a register. He wants pop of bitreg - well, that's in memory, so it needs to be fetched into a register. THere's no register free, so I'll have to save R7. Now, he wants the result in count. Oh, that's already in a register R3. Well, I just have to unsave R7, and now I've got save R7 R7 = bitreg pop R7 -> count unsave R7 Well, that's a nice bit of code that I wouldn't have been able to produce automatically, but at least I was able to arrange the register use for my programmer" I've just received a UNIX PC in the fire sale, and I'm told that the inline assembler functions can be made to do something like this, so maybe the world is slowly getting better. Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms.arpa I always felt that disclaimers were silly and affected, but there are people who let themselves be affected by silly things, so: my opinions are my own, and not the opinions of my employer, or any other organisation with which I am affiliated. I indicate my employer only so that other people may account for any possible bias I may have towards my employer's products or systems.