Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!wuarchive!zaphod.mps.ohio-state.edu!uwm.edu!linac!midway!tank!stephen
From: stephen@estragon.uchicago.edu (Stephen P Spackman)
Newsgroups: comp.std.c
Subject: Re: memcpy
Message-ID: <STEPHEN.90Sep22181237@estragon.uchicago.edu>
Date: 22 Sep 90 23:12:37 GMT
References: <1990Sep19.021418.11574@maths.tcd.ie> <187@thor.UUCP>
Sender: news@midway.uchicago.edu (News Administrator)
Organization: University of Chicago CILS
Lines: 41
In-Reply-To: scjones@thor.UUCP's message of 21 Sep 90 13:19:29 GMT

In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes:
> Recently, while debugging the Unix version of unzip.c,
> I found a surprising discrepancy between 'memcpy' on various machines.
> 
> In unzip.c it is assumed that the effect of
> 	buf[0] = c;
> 	memcpy(buf+1, buf, 20);
> is to set
> 	buf[0] = buf[1] = buf[2] = ... = buf[21] = c.

[other people then comment about how this bug (in zip, not in Unix)
arises]

Actually, if you know how the compression algorithm used by Zip works,
you'll see that the "stupid" memcpy() does EXACTLY what is required.
The compression scheme itself relies on overwriting behaviour because
it works by copying forward stuff that is already "behind" the current
point in the buffer, but improves performance for CYCLIC data (of
which the bytewise uniform data a la memset() is only a special case)
by allowing the length to exceed the absolute value of the relative
source offset.

As to why this code works at all, it turns out that on most machines
the appropriate stupid implementation IS the fastest; in fact on most
of the CISC micros there's an instruction that does exactly that, and
does it very fast indeed (being an instruction, not a loop).

Furthermore, since it doesn't contain any transfers of control if it
arrives as an instruction, many compilers will inline it.

So what the situation amounts to is an assumption on the part of the
programmer that having been given the freedom to implement memcpy()
however you like in this case, that any "sane" implementor would do it
the "easy" way - which is precisely what the algorithm needs. Where
this falls down, of course, is that (a) a VERY CISCy machine may
provide memmove() semantics in microcode; and that (b) a very fast
machine (or a hand-coded routine for a RISC) might do all of its
string moves in bus-width chunks and without cache interlocks, and
produce very interesting gibberish indeed.

stephen p spackman  stephen@estragon.uchicago.edu  312.702.3982