Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!wuarchive!zaphod.mps.ohio-state.edu!uwm.edu!linac!midway!tank!stephen From: stephen@estragon.uchicago.edu (Stephen P Spackman) Newsgroups: comp.std.c Subject: Re: memcpy Message-ID: Date: 22 Sep 90 23:12:37 GMT References: <1990Sep19.021418.11574@maths.tcd.ie> <187@thor.UUCP> Sender: news@midway.uchicago.edu (News Administrator) Organization: University of Chicago CILS Lines: 41 In-Reply-To: scjones@thor.UUCP's message of 21 Sep 90 13:19:29 GMT In article <1990Sep19.021418.11574@maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes: > Recently, while debugging the Unix version of unzip.c, > I found a surprising discrepancy between 'memcpy' on various machines. > > In unzip.c it is assumed that the effect of > buf[0] = c; > memcpy(buf+1, buf, 20); > is to set > buf[0] = buf[1] = buf[2] = ... = buf[21] = c. [other people then comment about how this bug (in zip, not in Unix) arises] Actually, if you know how the compression algorithm used by Zip works, you'll see that the "stupid" memcpy() does EXACTLY what is required. The compression scheme itself relies on overwriting behaviour because it works by copying forward stuff that is already "behind" the current point in the buffer, but improves performance for CYCLIC data (of which the bytewise uniform data a la memset() is only a special case) by allowing the length to exceed the absolute value of the relative source offset. As to why this code works at all, it turns out that on most machines the appropriate stupid implementation IS the fastest; in fact on most of the CISC micros there's an instruction that does exactly that, and does it very fast indeed (being an instruction, not a loop). Furthermore, since it doesn't contain any transfers of control if it arrives as an instruction, many compilers will inline it. So what the situation amounts to is an assumption on the part of the programmer that having been given the freedom to implement memcpy() however you like in this case, that any "sane" implementor would do it the "easy" way - which is precisely what the algorithm needs. Where this falls down, of course, is that (a) a VERY CISCy machine may provide memmove() semantics in microcode; and that (b) a very fast machine (or a hand-coded routine for a RISC) might do all of its string moves in bus-width chunks and without cache interlocks, and produce very interesting gibberish indeed. stephen p spackman stephen@estragon.uchicago.edu 312.702.3982