Xref: utzoo comp.lang.c:28431 comp.lang.misc:4962 comp.sys.ibm.pc:50006 comp.sys.ibm.pc.programmer:1311
Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!ames!dftsrv!mimsy!chris
From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.lang.c,comp.lang.misc,comp.sys.ibm.pc,comp.sys.ibm.pc.programmer
Subject: fast file copying (was questions about a backup program ...)
Keywords: copy
Message-ID: <24164@mimsy.umd.edu>
Date: 4 May 90 07:21:03 GMT
References: <255@uecok.UUCP> <1990Apr25.125806.20450@druid.uucp> <12578@wpi.wpi.edu>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 54

In article <12578@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes:
>Interestingly, this aspect of the copy program [reading and writing very
>large blocks] is one place where I think DOS is sometimes faster than
>UNIX.  I suspect that many UNIX versions of 'cp' use block-sized buffers.
>Doing so makes overly pessimistic assumptions about the amount of
>physical memory you're likely to get.  

None of the newsgroups to which this is posted are particularly suited
to discussions about O/S level optimisation of file I/O, but I feel
compelled to point out that `big gulp' style copying is not always, and
indeed not often, the best way to go about things.  The optimal point
is often not `read the whole file into memory, then write it out of
memory', because this requires waiting for the entire file to come in
before figuring out where to put the new blocks for the output file.
It is better to get computation done while waiting for the disk to transfer
data, whenever this can be done without `getting behind'.  Unix systems
use write-behind (also known as delayed write) schemes to help out here;
writers need use only block-sized buffers to avoid user-to-kernel copy
inefficiencies.

As far as comp.lang.c goes, the best one can do here is call fread()
and fwrite() with fairly large buffers, since standard C provides nothing
more `primitive' or `low-level', nor does it give the programmer a way
to find a good buffer size.  Better stdio implementations will do well
with large fwrite()s, although there may be no way for them to avoid
memory-to-memory copies on fread().  A useful fwrite() implementation
trick goes about like this:

	set resid = number of bytes to write;
	set p = base of bytes to write;
	while (resid) {
		if (there is stuff in the output buffer ||
		    resid < output_buffer_size) {
			n = MIN(resid, space_in_output_buffer);
			move n bytes from p to buffer;
			p += n;
			resid -= n;
			if (buffer is full)
				if (fflush(output_file)) goto error;
		} else {
-->			write output_buffer_size bytes directly;
			if this fails, goto error;
			p += n_written;
			resid -= n_written;
		}
	}

The `trick' is in the line marked with the arrow --> : there is no
need to copy bytes into an internal buffer just to write them, at
least in most systems.  (Some O/Ses may `revoke' access to pages that
are being written to external files.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris