Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!tektronix!reed!mdr
From: mdr@reed.UUCP (Mike Rutenberg)
Newsgroups: comp.arch
Subject: Re: RISC vs unaligned data
Message-ID: <12264@reed.UUCP>
Date: 31 Mar 89 23:19:54 GMT
References: <355@bnr-fos.UUCP> <13@microsoft.UUCP> <16058@cup.portal.com> <370@bnr-fos.UUCP> <11222@tekecs.GWD.TEK.COM>
Reply-To: mdr@reed.UUCP (Mike Rutenberg)
Organization: Reed College, Portland OR
Lines: 31

Andrew Klossner writes:
>For example, on the 88k, an architecture that doesn't have particularly
>good support for unaligned data, the compiler might generate code like
>this to fetch a word from an address that it knows will be odd:
	[code example]
>If the word is in the data cache, this takes seven cycles and wastes
>two scratch registers (r2 and r3).  (The code to fetch from an even but
>unaligned address takes five cycles.)  With hardware support it could
>do a better job ... but is it necessary to fetch an unaligned word in
>fewer than seven cycles?  That fetch takes fewer nanoseconds than it
>does on the modern, unalignment-forgiving CISC machine that I'm typing
>this on, which after all is the bottom line in RISC vs CISC.


The main problem is that the 7 instruction sequence you gave takes up
i-cache space and as you indicated needs registers.  If the processor
can deal with unaligned accesses, with the clearly associated
performance hit it implies over aligned data, you may get better icache
performance.  This becomes a bigger deal with larger programs and too
much unaligned data.

The Intel 80960KA has a nice memory interface that is fast and allows
unaligned data references.  Among things which assist in unaligned
accesses is a 3*(1-8 byte) fifo for outstanding memory write requests
(a similar fifo is for read requests).

Mike
-- 
Mike Rutenberg      Reed College, Portland Oregon     (503)239-4434 (home)
BITNET: mdr@reed.bitnet      UUCP: uunet!tektronix!reed!mdr
Note: These are personal remarks and represent no known organization --mdr