Path: utzoo!attcan!uunet!ncrlnk!ncr-sd!hp-sdd!hplabs!amdcad!crackle!tim From: tim@crackle.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: A simple question on RISC Message-ID: <23541@amdcad.AMD.COM> Date: 15 Nov 88 01:47:54 GMT References: <6544@xanth.cs.odu.edu> <75577@sun.uucp> <1618@imagine.PAWL.RPI.EDU> <419@augean.OZ> Sender: news@amdcad.AMD.COM Reply-To: tim@crackle.amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc. Sunnyvale CA Lines: 40 Summary: Expires: Sender: Followup-To: In article <419@augean.OZ> idall@augean.OZ (Ian Dall) writes: | In article <75577@sun.uucp> khb@sun.UUCP (Keith Bierman - Sun Tactical Engineering) writes: | > | > Or if they (wizzbang instructions) got used, it was | > so rare that it didn't matter. Or they got used, and it was slower | > than some combination of simple instructions. Or all of the above. | | Can anyone tell me *why* some of these microcoded instructions were | slower than a combination of simpler instructions on the same machine? | I am not debating CISC vs RISK here since both cases run on the *same* | (cisc) machine. If nothing else the second case must have resulted in | more memory accesses for instruction fetches. Was the difference | simply incompetence on the part of the micro code writer, or is there | some reason for this. There are a number of reasons, most of them due to the main problem of limited microcode space: 1) "Free" microcode sequences. "Hey, look at this! If we just change one input to the "add" sequence, we get "clear"! (Too bad clear now reads the data before storing a zero into it). 2) Limited data areas (and thus, limited algorithms for code sequences). Microcode doesn't normally have access to arbitrary precompiled data tables in memory like a library routine does, so we see things like the recent ARM reverse-bit sequence which is slower than a standard table-lookup. 3) Microcode is not as easily optimized. A general microcode routine to multiply two integers is easy enough to write, but it is usually faster to perform a series of shifts and adds in macrocode when you are multiplying by a constant. One could write each of these optimal sequences in microcode, and have a horrendous set of mul_by_5, mul_by_37, etc. instructions, but it is much easier and less wasteful to let the compiler handle it. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)