Path: utzoo!attcan!uunet!husc6!uwvax!oddjob!tank!uxc!uxc.cso.uiuc.edu!urbsdc!aglew From: aglew@urbsdc.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: Explanation, please! Message-ID: <28200197@urbsdc> Date: 7 Sep 88 13:14:00 GMT References: Lines: 20 Nf-ID: #R::-30:urbsdc:28200197:000:888 Nf-From: urbsdc.Urbana.Gould.COM!aglew Sep 7 08:14:00 1988 >I can immagine that on some machines it is faster to copy words into >register and repack the words in the registers rather than do a byte >copy, since you could be taking advantage of some hardware gak. > >Simple example: > > machine X has register W1 divided into B4, B5, B6, B7. To do a > copy, align the source pointer (doing byte copies) then read a > wrod-at-a-time into the W1 register, write it back out by writing > B4, B5, B6, B7 (little-endian). > >This is beginning to look suspiciously like the kinds of optimizations >that get done for bit BLTs. Anybody know if this ever really gets >done? Yep, it's done on Gould machines for halfwords to words in some copy routines. However, on the NP1 for "typical" distributions of operands, it turns out to be better to just copy at the greatest common denominator of alignment using the appropriate vector moves.