Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!bnrgate!bigsur!bnr-rsc!bcarh185!schow From: schow@bcarh185.bnr.ca (Stanley T.H. Chow) Newsgroups: comp.arch Subject: Handling mis-alignment (was Re: RISC Machine Data Structure Word Alignment Problems? Message-ID: <2038@bnr-rsc.UUCP> Date: 2 Feb 90 19:17:32 GMT References: <3428@odin.SGI.COM> Sender: news@bnr-rsc.UUCP Reply-To: bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) Organization: BNR Ottawa, Canada Lines: 64 Summary: Followup-To: Keywords: In article aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes: > >Microcoded unaligned data takes two cycles to load an unaligned datum. >(Assuming the unaligned datum overlaps two data bus widths.) MIPSco >style load-left and load-right take two cycles to load the same >unaligned datum. As you point out later, a lot depends on the actual alignment in relation to the bus. A lot also depends on the hardware available. It is not true that all microcode (or H/W) takes two cycles. It is true that all RISC ISA (announced todate) takes minimum of two instructions. Also, note that for microcode or H/W, the extra cycles (if any) may well be hidden in some pipeline stages. Whereas the RISC instructions must be issued one per clock. (Even for superscaler stuff, register scoreboarding probably forces one per clock, unless the compiler gets clever). > If the *possibly* unaligned datum is *actually* aligned, then a >microcoded unaligned operation _might_ require only one cycle -- but >the determination of alignment would probably be done so late in the >pipeline that it would probably be easier to just require two pipeline >slots for the unaligned load. As someone else posted, at least the IBM 3090 does this at no time penalty. There is also a rumor that the (new? unannounced?) Intel chips are zero penalty if the unknown-alignment datum is actually aligned, and the penalty for real misalignment is only one extra cycle. Anyone know better? > Such a model would only win if actually unaligned data occurred >infrequently enough that you would only allocate one cycle, and be >prepared to stall the pipeline (and insert another transfer) if the >datum were unaligned. How could this model lose? Can it *ever* do worse than the RISC must-align- everything model? >Handling the overlapping case, case (1), inherently requires two bus >transfers, and two bus transfers cost just about as much as two >instructions. This is not true. Two bus transfers to succesive words done at the same time can take advantage of burst, etc. The transfers also happen at typically the memory access pipeline. It should cost much less than two full instructions that take two slots everywhere. >Thing is, though, a processor with such a wide bus is probably so much >damned faster than any external I/O device you have (external >representation being the best justification for badly aligned data >formats) that you probably don't care if it didn't try to optimize >this case anyway. But there are lots of appllications that need to pack memory. I have seen some really time-critical code that (essentially) does only possibly misaligned data-accesses. It may be true that many, or even most, applications have the freedom and luxury of padding at will but a significant fraction wants performance with arbitary alignment. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185 Me? Represent other people? Don't make them laugh so hard.