Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!quintus!ok
From: ok@quintus.uucp (Richard A. O'Keefe)
Newsgroups: comp.arch
Subject: Re: RISC v. CISC --more misconceptions
Message-ID: <623@quintus.UUCP>
Date: 3 Nov 88 04:45:46 GMT
References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <19762@apple.Apple.COM> <1002@l.cc.purdue.edu> <19811@apple.Apple.COM>
Sender: news@quintus.UUCP
Reply-To: ok@quintus.UUCP (Richard A. O'Keefe)
Organization: Quintus Computer Systems, Inc.
Lines: 49

In article <19811@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
[Talking about integer multiplication and division.]
>I'm going further than that. I'm saying they are rare because the are
>unnecessary. They are rare because in the USUAL case they can be strength
>reduced to additions by an optimizing compiler. This is faster than using
>the obvious multiply instruction.

Did you notice the implicit assumption that multiplications are only
for address calculations?  Avoidable multiplications are rare because
a generation of programmers has been brainwashed that Hardware Rules,
and if some potentially useful operation is expensive it is their job
to avoid it rather than have the hardware and compiler people get it
right.  People are still avoiding procedure calls (and RISC designers
are assuming that procedure calls are not deeply nested) because old
designs made procedure calls expensive.

The one which is really painful is division.  When one codes up a hash
table, one knows (having read the literature) that remainder with a
prime is a Good Thing.  But one also knows that whizzbang machine X
has no hardware support for division, so to avoid a subroutine call to
a routine not known for its speed one sighs, puts in X & 4095 (instead
of X1 % 4097 or whatever), and wishes...

But to be realistic about this, let's compare a couple of CISCs with
what a good RISC might do.  The issue is not absolute speed, but the
intensity of the temptation to distort your code to avoid a function
perceived as expensive.  I measure this as cost/(cost of ADD).

		MC68020	80386	generic	R2000	WBMX	88k
MULS.L/ADD.L	~ 20	~ 5-10	~18	14	~44	4
DIVS.L/ADD.L	~ 45	~ 20	~35	35	~150	39+possible trap

MC68020 figures from manufacturer's manual
80386   figures from manufacturer's manual
generic figures assume 2-bit-at-a-time multiply step, 1-b.a.a.t. divide step
WBMX	multiply from Whizzbang Ltd's manual, divide _estimated_ from manual;
	figures include procedure call overhead.  WBMX has no divide step.
88k	figures from article <4759@pdn.UUCP> (Alan Lovejoy)
R2000	figures from article <7472@winchester.mips.COM> (Charlie Price)
	(1 for main op, + delay time 12 or 33, + 1 to pick up result)
The 80386 and 88k multiplies deliver 32 bits, the others 64.
The R2000 figures are worst case: other integer operations can be
overlapped with all but two of these cycles.  It would be intersting
to know how often this pays off.

The bottom line is that architectures should _support_ the operations
programmers find useful, but that some architects have shown that good
enough support can be had by doing part of an operation in hardware,
part in software.  Too bad about Whizzbang.