Path: utzoo!attcan!uunet!ncrlnk!ncr-sd!hp-sdd!hplabs!amdcad!crackle!tim
From: tim@crackle.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: A simple question on RISC
Message-ID: <23541@amdcad.AMD.COM>
Date: 15 Nov 88 01:47:54 GMT
References: <6544@xanth.cs.odu.edu> <75577@sun.uucp> <1618@imagine.PAWL.RPI.EDU> <419@augean.OZ>
Sender: news@amdcad.AMD.COM
Reply-To: tim@crackle.amd.com (Tim Olson)
Organization: Advanced Micro Devices, Inc. Sunnyvale CA
Lines: 40
Summary:
Expires:
Sender:
Followup-To:

In article <419@augean.OZ> idall@augean.OZ (Ian Dall) writes:
| In article <75577@sun.uucp> khb@sun.UUCP (Keith Bierman - Sun Tactical Engineering) writes:
| >
| > Or if they (wizzbang instructions) got used, it was
| > so rare that it didn't matter. Or they got used, and it was slower
| > than some combination of simple instructions. Or all of the above.
| 
| Can anyone tell me *why* some of these microcoded instructions were
| slower than a combination of simpler instructions on the same machine?
| I am not debating CISC vs RISK here since both cases run on the *same*
| (cisc) machine.  If nothing else the second case must have resulted in
| more memory accesses for instruction fetches. Was the difference
| simply incompetence on the part of the micro code writer, or is there
| some reason for this.

There are a number of reasons, most of them due to the main problem of
limited microcode space:

	1) "Free" microcode sequences.  "Hey, look at this!  If we just
	change one input to the "add" sequence, we get "clear"! (Too bad
	clear now reads the data before storing a zero into it).

	2) Limited data areas (and thus, limited algorithms for code
	sequences).  Microcode doesn't normally have access to
	arbitrary precompiled data tables in memory like a library
	routine does, so we see things like the recent ARM reverse-bit
	sequence which is slower than a standard table-lookup. 

	3) Microcode is not as easily optimized.  A general microcode
	routine to multiply two integers is easy enough to write, but it
	is usually faster to perform a series of shifts and adds in
	macrocode when you are multiplying by a constant.  One could
	write each of these optimal sequences in microcode, and have a
	horrendous set of mul_by_5, mul_by_37, etc. instructions, but it
	is much easier and less wasteful to let the compiler handle it.


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)