Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!ncrlnk!ncrstp!mercer From: mercer@ncrstp.StPaul.NCR.COM (Dan Mercer) Newsgroups: comp.unix.wizards Subject: Re: fastest way to copy hunks of memory Message-ID: <420@ncrstp.StPaul.NCR.COM> Date: 3 May 90 22:09:53 GMT References: <5531@helios.ee.lbl.gov> Reply-To: mercer@ncrstp.StPaul.NCR.COM (Dan Mercer) Organization: St. Paul Lines: 87 In article <5531@helios.ee.lbl.gov> tierney@ux1.lbl.gov (Brian Tierney) writes: : :Which method is fastest? : :1. char *p1, *p2; : : for (j = 0; j < size; j++) : *p1++ = *p2++; : :2. bcopy(p2,p1,size); : :3. memcpy(p1,p2,size); : :In general, system calls are slower, right? (ie, 1 faster that 2 and 3) : :BTW, what's the difference between bcopy and memcpy anyway?? : :THANKS. : : :/---------------------------------------v-------------------------------------\ :| Brian Tierney, Computer Graphics Lab | internet: tierney@ux1.lbl.gov | :| Lawrence Berkeley Laboratory | or BLTierney@lbl.gov | :| "I am her hero, and she is my heroin": Rocky Ericson | It's c language implementation independent. Bcopy or memcpy may be implemented as inline assembly (as strcpy is on the NCR Tower's SVr3 machines if is included). They may also be implemented in other fashions that are quicker than standard C function calls. If the hardware architecture supports move instructions the savings can be significant. for (j = 0; j < size; j++) *p1++ = *p2++; At the minimal, this breaks down to: exclusive or register x against register x - 1 cycle load register a with size - 1 cycle label: move character based at value in x plus base y to position at x plus base z - 1 cycle increment x - 1 cycle compare x to a - 1 cycle branch not equal to label - 1 cycle Cost = 2 + (size * 4) cycles In fact, if the architecture supports a branch and decrement instruction, the following c code would be faster - for (j=size;j;j--) *p1++ = *p2++; load register x with size - 1 cycle label: move character based at value in x plus base y to position at x plus base z - 1 cycle decrement x and branch to label if not zero - 1 cycle Cost = 2 + (size * 2) cycles However, if the bcopy or memcpy is expanded inline and the architecture support move character instructions, the savings may be far greater. For instance, we have a processor that does an MVC - move characters. While the bytes being moved are not on full words, or the length to be moved is less that a full word, one cycle is used for each byte. But if full words can be moved, only one cycle is required to move each 4 byte chunk. Socrates once came upon a bunch of philosophers arguing about how many teeth a horse had. One group argued for a certain number based on the length of the horses snout, a second upon the diet the horse ate, a third group based upon number theory. Socrates disappeared for a minute and came back and told them the exact number. They were incensed. How could he be so sure of his number, when he hadn't even offered a theory to explain it. Simple, he replied, while they were arguing, he had gone back to the stables and counted. So, if you want the real answer, I suggest you do the same. If you can't decipher the answer from the assembly generated, put together a nice looping package and try that. -- Dan Mercer Reply-To: mercer@ncrstp.StPaul.NCR.COM (Dan Mercer)