Newsgroups: comp.unix.ultrix Path: utzoo!utgpu!watserv1!watcgl!idallen From: idallen@watcgl.waterloo.edu (Ian! D. Allen [CGL]) Subject: Re: _slow_ rdump Message-ID: <1990Oct15.034337.21119@watcgl.waterloo.edu> Summary: Everything I know about dump and rdump on Ultrix Organization: Computer Graphics Laboratory, University of Waterloo, Ontario, Canada References: <8844@pitt.UUCP> <1990Oct14.205942.27205@decuac.dec.com> Date: Mon, 15 Oct 90 03:43:37 GMT Lines: 203 Here's stuff on dump/rdump I sent to comp.unix.ultrix last summer. We run Ultrix 3.1 and 3.1C. From idallen Thu Jun 7 21:42:39 1990 To: comp.unix.ultrix Subject: Why isn't dump maximally efficient with TK70 tapes? DECsystem 5400, Ultrix 3.1C, RA90 disk, one user (me). Watch the elapsed real times here. Here's a plain root dump to tape (TK70): # time dump 0 / DUMP: Date of this level 0 dump: Thu Jun 7 21:17:43 1990 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rra0a (/) to /dev/rmt0h DUMP: Mapping (Pass I) [regular files] DUMP: Mapping (Pass II) [directories] DUMP: Estimates based on 1200 feet of tape at a density of 10240 BPI... DUMP: This dump will occupy 1103 (10240 byte) blocks on 0.13 tape(s). DUMP: Dumping (Pass III) [directories] DUMP: Dumping (Pass IV) [regular files] DUMP: 57.43% done, finished in 0:03 DUMP: 1103 tape blocks were dumped on 1 tape(s) DUMP: Tape rewinding DUMP: Dump is done 0% real=9:29 usr=0.3 sys=1.9 rd=0 wr=4 mem=56 pg=3 rec=17 sw=0 sig=0 cs=2776 Here's the identical root dump piped to dd to tape: recorder# mt rew recorder# time sh -c "dump 0f - / | dd bs=32k rbuf=2 wbuf=2 of=/dev/rmt0h" DUMP: Date of this level 0 dump: Thu Jun 7 21:28:18 1990 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rra0a (/) to standard output DUMP: Mapping (Pass I) [regular files] DUMP: Mapping (Pass II) [directories] DUMP: Estimated 11295744 bytes output to Standard Output DUMP: Dumping (Pass III) [directories] DUMP: Dumping (Pass IV) [regular files] DUMP: 11295744 bytes were dumped to Standard Output DUMP: Dump is done 0+2780 records in 0+2780 records out 4% real=3:34 usr=0.7 sys=8.7 rd=1 wr=8 mem=37 pg=2 rec=17 sw=0 sig=0 cs=10111 That's almost three times faster! Why can't dump be as good as dd? Dumps are of major importance; I would have thought that dump would be the most clever user of the tape drive. I can't believe this. Am I missing something? I must be missing something. From idallen Fri Jun 8 02:46:42 1990 Subject: Fun with dump Ultrix dump of root to nowhere: bandicoot# time dump 0f - / >/dev/null [dump stuff deleted] 16% real=0:22 usr=0.9 sys=2.7 rd=8 wr=4 mem=332 pg=0 rec=0 sw=0 sig=0 cs=959 Ultrix rdump of root to nowhere: bandicoot# time /bin/rdump -0f bandicoot:/dev/null / [dump stuff deleted] 39% real=0:55 usr=2.8 sys=19.1 rd=2 wr=6 mem=282 pg=3 rec=60 sw=0 sig=0 cs=4533 Ultrix rdump of root to a real tape: bandicoot# time rdump -0f recorder:/dev/nrmt0h / [dump stuff deleted] [I hit break after 6 minutes when dump estimated the dump would take another 20 minutes] Ultrix dump of root to rsh/dd to a tape: bandicoot# time dump 0f - / | rsh rec dd bs=32k rbuf=2 wbuf=2 of=/dev/rmt0h [dump stuff deleted] 7% real=1:31 usr=1.0 sys=6.0 rd=2 wr=4 mem=351 pg=0 rec=3 sw=0 sig=0 cs=4900 15% real=2:48 usr=2.5 sys=24.1 rd=15 wr=7 mem=206 pg=0 rec=3 sw=0 sig=0 cs=10300 What I learned: Don't use rdump. It's an order of magnitude slower than a pipe to dd. In fact, even dump is slower than dump to stdout piped into dd with wbuf=2, because of bugs in the Ultrix nbuf code. At least Ultrix dd handles multiple tapes and multi-buffer writes; isn't that convenient? From idallen Fri Jun 8 04:19:02 1990 Subject: More fun with dump on Ultrix You'd think that the dump command would have the smarts in it to write tapes efficiently. Wrong. I wrote a simple program that reads stdin, builds a 32K buffer, and writes it out using Ultrix double-buffer I/O. I used it on 198525952 bytes of /usr file system on our DS5400, sent to a TK70 295Mb tape cartridge: # time sh -c "dump 0f - /usr | ./a.out >/dev/rmt0h" [dump info deleted] 5% real=43:01 usr=11.3 sys=131.8 rd=4 wr=8 mem=63 pg=2 rec=18 sw=0 sig=0 cs=138864 43 minutes elapsed time. Compare that with what the default gets you: # dump 0 /usr [dump info deleted] DUMP: Estimates based on 1200 feet of tape at a density of 10240 BPI... DUMP: This dump will occupy 19400 (10240 byte) blocks on 2.29 tape(s). Woops. This dump won't even fit on the tape using the defaults. Even if I kludged the tape size to make it seem to fit, it would still take 3 *hours* to dump. Ultrix dump also uses double-buffer I/O, but it specifies 8 buffers instead of just 2. The software release notes for Ultrix 3.1C suggest 2 is better than more than 2, and this sure bears that out. From idallen Fri Jun 8 17:08:17 1990 To: comp.unix.ultrix Subject: More undocumented performance issues with dump Dump to stdout (a tape): # time dump 0bf 32k - / >/dev/nrmt0h DUMP: 11318272 bytes were dumped to Standard Output DUMP: Dump is done 0% real=15:34 usr=0.3 sys=1.6 rd=0 wr=4 mem=92 pg=2 rec=18 sw=0 sig=0 cs=2058 Dump to the same tape directly: # time dump 0bf 32k /dev/nrmt0h / DUMP: 345 tape blocks were dumped on 1 tape(s) DUMP: Dump is done 0% real=7:16 usr=0.4 sys=1.6 rd=0 wr=4 mem=94 pg=2 rec=18 sw=0 sig=0 cs=2019 Ultrix dump assumes that any output to stdout is to a pipe; it doesn't do the same stat() [fstat()] tests to determine device type that it does when you give the file name on the command line. From idallen Fri Jun 8 17:16:46 1990 To: jpe@egr.duke.edu Subject: Re: Why isn't dump maximally efficient with TK70 tapes? > Problem #2 -- according to my man pages for "dd" the rbuf and wbuf options > cannot be used at the same time. Besides, the default wbuf value is 8 > for devices that support it. Indeed, you, the source, and the man page are correct. The example in section 1.1.13 of the Ultrix 3.1C release notes is wrong, and I copied it. Silly me -- I thought the release notes knew something the man page did not. The first option wins and over-rides following [rw]buf values. What is not wrong is the statement in 1.1.13 "to get the most performance gain, use a value of 2 with rbuf and wbuf options". The default 8 buffers cause *worse* performance that specifying 2. > Problem #3 -- The block size you specified to "dd" was wrong. Dump writes > in 10k blocks, not 32k. Also you need to specify the obs (instead of bs) > and specify a cbs equal to the obs. This will buffer the input to the > output block size, then write it to tape. Restore will read a tape > created this way, I doubt if it can read yours. No, I wanted to write 32k blocks; it's faster and more efficient. I write dump tapes far more than I read them; I wanted to speed up the writing. Restore reads such tapes just fine if unblocked first: # dd if=/dev/rmt0h bs=32k rbuf=2 | restore -if - You're right about the failure to buffer up to the output buffer size, but I don't want to pay the price of using dd to make the conversion -- it's way too slow. See my latest note in comp.unix.ultrix about a little program that buffers up to 32k and writes. > Problem #4 -- You should note instead the system times and percentage of > CPU used. On my VAXserver 3600 these times jumped dramatically in order > to give me a few seconds real-time savings. Also when I used a no-rewind > device "dump" was actually faster than the "dd pipe." On a CPU-loaded > system you might not have such a big win.. I observed a factor of three in real-time performance; more than a few seconds and of importance to us. The tape rewind added 12 seconds to any times I posted. Looking at the source to dd, I see that if one doesn't use "bs=", it copies the data painfully from the input buffer to the output buffer one byte at a time. No wonder that eats cpu. I wrote a simple program to buffer input and write it out in 32k chunks; this works much better than dd, but it won't handle multiple volumes. See my comp.unix.ultrix posting. > Problem #5 -- What happens if one of your partitions becomes larger than > a TK70? Ultrix dd handles multi-volumes. I think the problem is just that dump uses too many buffers and in Ultrix 3.1C that makes things worse rather than better. Or perhaps dump's buffers aren't aligned on page boundaries, and dd's are. (See Ultrix Version 3.1C Release Notes section 1.1.12.) -- -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada