Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!ncar!ico!dougp From: dougp@ico.isc.com (Doug Pintar) Newsgroups: comp.unix.i386 Subject: Re: Archive Tapes Message-ID: <1990May7.210656.6841@ico.isc.com> Date: 7 May 90 21:06:56 GMT References: <29490@cup.portal.com> <261@bradf.UUCP> Reply-To: dougp@ico.ISC.COM (Doug Pintar) Organization: Interactive Systems Corp., Boulder CO Lines: 64 In article <261@bradf.UUCP> brad@bradf.UUCP (Bradley W. Fisher) writes: >... things do revolve around the buffer size. As I understand it, this is >the amount of data transferred from origin to ram before it is transferred >to the final destination. Also, as I understand it, this has been declared >by AT&T source license code to be 10k (or 20 - 512 byte blocks,hence the >usual blocking factor of 20) for tar. This was probably quite adequate for >older start/stop reel and cartridge systems. I think you're confusing application code and kernel code here. If you use the '-B' option on cpio, it gives you 5120 bytes, 10 Unix-standard blocks. The '-C' option will let you use a bigger block size, or you can run through 'dd' to do the buffering. I've never used the SCO system, so I may be incorrect in the following conjecture; take it with however much salt you wish... I suspect that the difference in performance you are seeing between (fundamentally) unblocked cpio/tar transfers on SCO and the other systems is that the SCO tape driver is probably buying a large buffer and hiding the buffering operation from the application program. We (Interactive) rejected this approach for two reasons: 1) For dumb (single address, single count) DMA tape controllers, you need to have PHYSICALLY-CONTIGUOUS memory for your buffer. Large chunks of this become difficult to find after the system has been running for any length of time, so you are usually forced to buy the pages at INIT time. This removes that memory from user programs WHETHER OR NOT THE TAPE IS BEING USED! 2) the original philosophy for Unix was (and still should be, IMHO) that things that can be done in user code SHOULD be done in user code, not in the kernel. Since 'dd' existed for buffering (although it tends to hide end-of-tape detection even more, sigh) and the latest cpio supports the -C option, there is no real win to attaching a comparatively expensive resource (memory) to an I/O device just so that programs not using large buffers run fast. >Various companies that have licensed the source to *NIX either have or have >not addressed this problem, and hacked the source for tar to increase the >buffer size. It seems Interactive falls into the latter category. However, >SCO falls into the former ... their "blocking factor" of 20 *I beleive* is >really a multiple of ten ... and about 100k is being tranferred at a time. >With less stops for transfer of data this results in an overall rate increase. > See above comment... >Now for the clincher ... how do you keep it streaming? Well, going out to >tape(the slowest device in the picture) you would ideally fill one ram buffer >area with data from the disk and feed it to another ram buffer area (are we >talking pipes here?) that is in control of feeding the tape drive. I think >from what I've read, that to be able to do this involves the use of "shared >memory", and in brief I've also been told "you can't do that with the Intel >achitecure". There is nothing in the Intel architecture that prevents having shared memory between two processes; if there were, 386-based Unix systems would never pass the System V Verification (Validation?) Suite (SVVS, required if you're going to call something Unix). You could indeed write 2 cooperating processes, one of which fills memory from disk while the other writes it to tape. This is known as double-buffering, and is a pain to do under Unix (as it requires two processes and shared memory instead of asynchronous I/O as God intended). I guess most systems writers never considered it a big enough problem to bother re-writing the back-up programs. Using large buffers will cause the tape to stream for quite a time, stop a little, and then stream again, so it saves BUNCHES of time over writing little teensy records. Just as a BTW, AIX/PS-2 (at least version 1.2) DOES have a 2-task cpio. Besides, if backups didn't take forever, where would all the grave-shift operations people find work? :-) DLP