Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Not-so RISCy
Keywords: risc
Message-ID: <13259@winchester.mips.COM>
Date: 14 Feb 89 17:53:43 GMT
References: <732@wpi.WPI.EDU>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 69

In article <732@wpi.WPI.EDU> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
>
>Reduction of instruction set size/complexity is the main area of design which
>enhances speed in RISC processors.  Another area which I'm wondering about is
>data size handling.  Modern RISC processors handle 8, 16, 32 and 64 bit words.
>Some even handle data which crosses "word" bounderies (and on some (well one)
>the byte order can be changed).  The logic that must be dedicated to this must
>be incedible, plus this logic is in the memory data path and therefore might a
>speed constaint (especially if the data goes through the ALU before being
>presented to the registers).  Would it be a terrible hardship to only have two
>data sizes (perhaps character and word) and not allow words to cross word
>boundaries?  Certainly it would require that people don't use "bad"
>programming techniques similer to what has to be done on 68000 or IBM 360. 
>But would not the improvement in speed (by freeing up chip space to allow for
>more registers or to simply reduce data path delay time) be worth it?

1) Automatic handling of unaligned data is indeed expensive, which is why
RISC machines geernally omit it.
2) You certainly need word & character operations [to match the statistics
of user programs.]  If you have to materialize halfword ops, UNIX kernel
code will suffer, for three reasons:
	a) There are many densely-encoded structures.  Some of those might
	convert shorts to ints, but that doesn't do anything about:
	b) Networking code has 16-bit things all over the place, and you have
	NO CHOICE about the sizes, and
	c) When dealing with arbitrary devices, across things like VME buses,
	you'd better be able to generate indivisible 16-bit loads/stores,
	or your choice of peripheral controllers will be impacted.  Some must
	be exactly 16-bits to match the semantics of the devices.
Although MOST user programs don't use 16-bit quantities a lot, some do, a lot.

3) Once you have load word, load byte [signed|unsigned], and load half
[signed|unsigned], all of which you really want to have, it doesn't take
much more logic to do the unaligned operations (as separate instructions,
NOT as an automatic thign that happens for unaligned operations).
4) Once you have all of that, it actually takes very little logic to do
the byte-ordering swapping: in fact, what really happened was that the
alignment network that shuffles bytes around anyway just got more complete.
Oddly enough, I don't think it ended up taking any more silicon space,
as the width was the same (32 bits), and the height was already forced by
other constraints.
5) As usual, most of this has to be determined scientifically, by simulation
of the impact of omitting the partial-word instructions.  It is interesting
that at least {HP, MIPS, Sun, Motorola} all came to the same conclusions on
this (include the partial-word load/stores).   In our case, we had some
heritage of word+byte only (Stanford MIPS); I wouldn't put UNIX on a machine
that didn't have 16-bit operations, even though many user-level statistics
wouldn't justify their presence.
6) The unaligned load/store operations have proved absolutely invaluable.
People maybe able to clean up their act on new code, but sometimes they
have huge databases that have alignment problems.  The unaligned operations
turn out to be useful for C strings, COBOL+PL/1, and for porting large
FORTRAN programs that have COMMON+EQUIVALENCE combinations that effectively
prohibit "correct" alignment, especially if these came from the IBM or DEC
worlds...which a few programs do.  If you own a 2-million line CAD program,
which you didn't write, and which contains code thru which the armies have
marched thru the years, you do NOT want to be told that you must rework the
program before you can get it to work the very first time.  It's a lot easier
to turn on a compiler switch that uses the unaligned instructions, typically
losing 10-15% of performance, and either tune it later, or not bother at all,
but at least get the application working....

Anyway, it's a good question: it's always good to question why features are
included.  In this case, there are good reasons.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086