Xref: utzoo comp.lang.c:12382 comp.arch:6273
Path: utzoo!yunexus!geac!syntron!jtsv16!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!rutgers!mit-eddie!uw-beaver!uw-june!pardo
From: pardo@june.cs.washington.edu (David Keppel)
Newsgroups: comp.lang.c,comp.arch
Subject: Re: Explanation, please!
Message-ID: <5654@june.cs.washington.edu>
Date: 6 Sep 88 17:25:50 GMT
Article-I.D.: june.5654
References: <638@paris.ics.uci.edu> <dpmuY#2EBC4R=eric@snark.UUCP> <566@pcrat.UUCP> <9087@pur-ee.UUCP>
Reply-To: pardo@uw-june.UUCP (David Keppel)
Organization: U of Washington, Computer Science, Seattle
Lines: 34

hankd@pur-ee.UUCP (Hank Dietz) writes:
>	if ((p - q) & 3) *byte copy* else *struct copy*

I believe that the VAX "movc" command takes arbitrary pointers and
does the following:

* If both are word-aligned, do a word copy (I mean a 4-byte word).
* If both are non-aligned and could be aligned with 1, 2, or 3 bytes
  of byte-copy at either end, then do a byte copy at either end and do
  a word copy down the middle.
* If niether aligned then ??

Unfortunately, my VAX hardware reference is out of town for a couple
of weeks, so I can't ask him about neither aligned.  Anybody know?

I can immagine that on some machines it is faster to copy words into
register and repack the words in the registers rather than do a byte
copy, since you could be taking advantage of some hardware gak.

Simple example:

    machine X has register W1 divided into B4, B5, B6, B7.  To do a
    copy, align the source pointer (doing byte copies) then read a
    wrod-at-a-time into the W1 register, write it back out by writing
    B4, B5, B6, B7  (little-endian).

This is beginning to look suspiciously like the kinds of optimizations
that get done for bit BLTs.  Anybody know if this ever really gets
done?

	;-D on  ( Ahh.  Architecture at its finest )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo