Path: utzoo!attcan!uunet!husc6!uwvax!oddjob!tank!uxc!uxc.cso.uiuc.edu!urbsdc!aglew
From: aglew@urbsdc.Urbana.Gould.COM
Newsgroups: comp.arch
Subject: Re: Explanation, please!
Message-ID: <28200197@urbsdc>
Date: 7 Sep 88 13:14:00 GMT
References: <dpmuY#2EBC4R=eric@snark.UUCP>
Lines: 20
Nf-ID: #R:<dpmuY#2EBC4R=eric@snark.UUCP>:-30:urbsdc:28200197:000:888
Nf-From: urbsdc.Urbana.Gould.COM!aglew    Sep  7 08:14:00 1988


>I can immagine that on some machines it is faster to copy words into
>register and repack the words in the registers rather than do a byte
>copy, since you could be taking advantage of some hardware gak.
>
>Simple example:
>
>    machine X has register W1 divided into B4, B5, B6, B7.  To do a
>    copy, align the source pointer (doing byte copies) then read a
>    wrod-at-a-time into the W1 register, write it back out by writing
>    B4, B5, B6, B7  (little-endian).
>
>This is beginning to look suspiciously like the kinds of optimizations
>that get done for bit BLTs.  Anybody know if this ever really gets
>done?

Yep, it's done on Gould machines for halfwords to words in some copy 
routines. However, on the NP1 for "typical" distributions of operands, 
it turns out to be better to just copy at the greatest common denominator
of alignment using the appropriate vector moves.