Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!bnrgate!bigsur!bnr-rsc!bcarh185!schow
From: schow@bcarh185.bnr.ca (Stanley T.H. Chow)
Newsgroups: comp.arch
Subject: Handling mis-alignment (was Re: RISC Machine Data Structure Word Alignment Problems?
Message-ID: <2038@bnr-rsc.UUCP>
Date: 2 Feb 90 19:17:32 GMT
References: <3428@odin.SGI.COM> <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu>
Sender: news@bnr-rsc.UUCP
Reply-To: bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow)
Organization: BNR Ottawa, Canada
Lines: 64
Summary:
Followup-To:
Keywords:

In article <AGLEW.90Jan31211451@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:
>
>Microcoded unaligned data takes two cycles to load an unaligned datum.
>(Assuming the unaligned datum overlaps two data bus widths.)  MIPSco
>style load-left and load-right take two cycles to load the same
>unaligned datum.

As you point out later, a lot depends on the actual alignment in relation
to the bus. A lot also depends on the hardware available. It is not true
that all microcode (or H/W) takes two cycles. It is true that all RISC
ISA (announced todate) takes minimum of two instructions.

Also, note that for microcode or H/W, the extra cycles (if any) may well
be hidden in some pipeline stages. Whereas the RISC instructions must be
issued one per clock. (Even for superscaler stuff, register scoreboarding
probably forces one per clock, unless the compiler gets clever).

>    If the *possibly* unaligned datum is *actually* aligned, then a
>microcoded unaligned operation _might_ require only one cycle -- but
>the determination of alignment would probably be done so late in the
>pipeline that it would probably be easier to just require two pipeline
>slots for the unaligned load.

As someone else posted, at least the IBM 3090 does this at no time penalty.
There is also a rumor that the (new? unannounced?) Intel chips are zero 
penalty if the unknown-alignment datum is actually aligned, and the penalty
for real misalignment is only one extra cycle. Anyone know better?

>    Such a model would only win if actually unaligned data occurred
>infrequently enough that you would only allocate one cycle, and be
>prepared to stall the pipeline (and insert another transfer) if the
>datum were unaligned.

How could this model lose? Can it *ever* do worse than the RISC must-align-
everything model?

>Handling the overlapping case, case (1), inherently requires two bus
>transfers, and two bus transfers cost just about as much as two
>instructions.

This is not true.

Two bus transfers to succesive words done at the same time can take
advantage of burst, etc. The transfers also happen at typically the
memory access pipeline. It should cost much less than two full
instructions that take two slots everywhere.

>Thing is, though, a processor with such a wide bus is probably so much
>damned faster than any external I/O device you have (external
>representation being the best justification for badly aligned data
>formats) that you probably don't care if it didn't try to optimize
>this case anyway.

But there are lots of appllications that need to pack memory. I have
seen some really time-critical code that (essentially) does only
possibly misaligned data-accesses. It may be true that many, or even
most, applications have the freedom and luxury of padding at will but 
a significant fraction wants performance with arbitary alignment.


Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185
Me? Represent other people? Don't make them laugh so hard.