Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!tektronix!orca!tekecs!frip!andrew
From: andrew@frip.wv.tek.com (Andrew Klossner)
Newsgroups: comp.arch
Subject: RISC vs unaligned data
Message-ID: <11222@tekecs.GWD.TEK.COM>
Date: 31 Mar 89 18:47:49 GMT
References: <355@bnr-fos.UUCP> <13@microsoft.UUCP> <16058@cup.portal.com> <370@bnr-fos.UUCP>
Sender: andrew@tekecs.GWD.TEK.COM
Organization: Tektronix, Wilsonville, Oregon
Lines: 45

[]

	"the historical trend is to be progressively more tolerant of
	misalignment, e.g. IBM /360 /370, Motorola 68K families. All
	the "tolerant" machines always attach a *penalty* to
	misalignment. It is only the very recent crop of so-called RISC
	chips that is requiring alignment again."

Many contributors to this discussion seem to hold the opinion that, if
alignment isn't supported by hardware, it isn't supported at all.  But
one of the points of RISC is to move complexity from hardware to
software.  Why not just let the compiler do it?

If the compiler knows the alignment of a word (the low two bits of the
address are a compile-time constant, as for an unaligned word within an
aligned structure), it can do a (slightly) better job than if it is
totally clueless about the runtime address.  PL/I provided the
"UNALIGNED" specifier to advantage on the 360/370 machines.  A system
supplier willing to extend their C language could add a similar
construct to C.

For example, on the 88k, an architecture that doesn't have particularly
good support for unaligned data, the compiler might generate code like
this to fetch a word from an address that it knows will be odd:

				; address of unaligned word to fetch is in r10
	ld.bu	r1,r10,0
	ld.hu	r2,r10,1
	ld.bu	r3,r10,3
	mak	r1,r1,8<24>
	mak	r2,r2,16<8>
	or	r1,r1,r2
	or	r1,r1,r3
				; word is in r1

If the word is in the data cache, this takes seven cycles and wastes
two scratch registers (r2 and r3).  (The code to fetch from an even but
unaligned address takes five cycles.)  With hardware support it could
do a better job ... but is it necessary to fetch an unaligned word in
fewer than seven cycles?  That fetch takes fewer nanoseconds than it
does on the modern, unalignment-forgiving CISC machine that I'm typing
this on, which after all is the bottom line in RISC vs CISC.

  -=- Andrew Klossner   (uunet!tektronix!orca!frip!andrew)      [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]