Xref: utzoo comp.lang.c:12373 comp.arch:6267
Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mailrus!cornell!uw-beaver!uw-june!rik
From: rik@june.cs.washington.edu (Rik Littlefield)
Newsgroups: comp.lang.c,comp.arch
Subject: Re: Explanation, please!
Summary: Non-aligned copies done efficiently with word ops.
Message-ID: <5658@june.cs.washington.edu>
Date: 6 Sep 88 19:03:38 GMT
References: <5654@june.cs.washington.edu>
Organization: U of Washington, Computer Science, Seattle
Lines: 21

In article <5654@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes:
> 
> I can immagine that on some machines it is faster to copy words into
> register and repack the words in the registers rather than do a byte
> copy, since you could be taking advantage of some hardware gak.
> 

On the old CDC 6000-series machines (early RISCs...) that was the *only*
practical way to do it, as well as being blazingly fast.  We had copies
that would handle arbitrary *bit* alignments at a cost of around 6 instructions
and 2 memory references per 60-bit word, in the middle of the string.  
The sequence was basically fetch, shift, mask, mask, OR, and store, 
appropriately rearranged to minimize memory delay and functional unit 
conflicts, of course.  I vaguely remember that this thing could even
be unrolled a couple of times and still fit in the instruction cache
("stack", in those days) for machines expensive enough to have one.

VAXen I don't know about for sure, but I'd be real surprised if their
microcode didn't do the same thing.

--Rik