Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!know!zaphod.mps.ohio-state.edu!sdd.hp.com!hplabs!hpfcso!hpldola!hp-lsd!was From: was@hp-lsd.COS.HP.COM (Bill Stubblebine) Newsgroups: comp.os.cpm Subject: Re: How to speed up Ampro LB+ SCSI? Message-ID: <8190005@hp-lsd.COS.HP.COM> Date: 20 Aug 90 17:35:11 GMT References: <8190004@hp-lsd.COS.HP.COM> Organization: HP Logic Systems Division - ColoSpgs, CO Lines: 216 Several weeks ago, I asked for advice on how to improve throughput for bulk data transfers from my SCSI hard disk to my SCSI QIC tape drive. For those who missed the original article, my configuration is: Ampro LB Z80+ (w/built-in SCSI interface) Adaptec ACB4000 (not 4000A) SCSI hard disk controller Seagate ST-125 20 MB 40 ms hard disk drive 3M MCD-403 40 MB QIC SCSI tape drive NZ-COM/Z-System The 3M MCD-403 SCSI tape drive was added recently to support backups. As I started transferring data between the hard disk and the tape drive, I discovered that although the SCSI disk performance was adequate for interactive and disk-to-disk operations, the hard disk could not source or sink data fast enough to keep the tape drive streaming during transfers. Before I posted my original request, I had experimented with several disk transfer strategies to try to increase throughput. All of my tests employed standard BIOS calls that transfers 128 bytes per BIOS call, based on Ampro's BIOS deblocking algorithm that reads or writes 512-byte SCSI logical blocks to the hard disk. My experiments indicated that BIOS calls could never achieve sufficient throughput to keep the cartridge tape drive streaming, no matter what the interleave factor is on the tape drive or on the disk drive. With all the stopping, repositioning and restarting of the cartridge drive, the overall throughput from disk to tape was under 3K bytes per second, plus the agony of hearing the drive stop and start for each 8K SCSI tape block transferred. Having run out of ideas, I asked the net for advice, and was gratified by the quantity and quality of the responses I received. To make a long story short, I have increased the overall throughput of disk to tape transfers from under 3K bytes per second to 12.7K bytes per second, allowing 10 megabytes to be backed up in about 13 minutes unattended. This is bliss compared to the endless attended floppy disk backups I am accustomed to. To assist anyone who may be facing similar system integration problems, I decided to keep a log of my experiments, which is summarized below. The quadrupling of throughput from 3K bytes/sec to 12.7K bytes/sec resulted from three categories of improvements to my configuration: 1. Read or write as many bytes as possible in each SCSI command, both from the SCSI hard disk and the SCSI tape drive. 2. Use the Z80 high-speed INIR/OTIR I/O instructions instead of software controlled byte-by-byte handshaking to talk to the 5380 SCSI interface chip on the Ampro LB+. 3. Once #1 and #2 are implemented, select optimal interleave factors on both the hard disk and the tape drive to maximize overall throughput. The biggest improvement came from #1. Reading 8k from the disk in one SCSI command more than doubled the overall throughput compared to normal BIOS calls, providing streaming operation in the tape drive for tape interleave factors of 6:1 or greater. HD interleave: 9:1 HD transfer mode: byte-by-byte HD transfer size: 8K x 1 Tape interleave: 6:1 Tape transfer mode: byte-by-byte Tape transfer size: 8K x 1 Net throughput: 6631 Kbytes/sec Next, I modified the disk read routine to read 8K bytes in two 4K SCSI commands, thereby simulating processing two distinct 4K CP/M disk allocation groups. The results were the same as for a single 8K SCSI operation, i.e., the tape keeps streaming. This experimental result suggests that the disk-to-tape backup program should bypass the BIOS altogether, and process CP/M allocation groups directly from the CP/M disk directory entries, converting the (4K-byte) CP/M allocation group number into a SCSI logical block number, then read all 4K of the allocation block from the disk in one SCSI command. This should be a robust strategy, because (in the Ampro system) HD space cannot be allocated in chunks of less than 4K bytes = 1 CP/M allocation group. HD interleave: 9:1 HD transfer mode: byte-by-byte HD transfer size: 4K x 2 Tape interleave: 6:1 Tape transfer mode: byte-by-byte Tape transfer size: 8K x 1 Net throughput: 6631 Kbytes/sec Next, I changed the SCSI handshakng from byte-by-byte to INIR/OTIR burst mode for both the hard disk and the MCD tape drive. This increased the burst transfer rate from 15us per byte to 5.25us per byte for both devices. Using a scope to monitor the SCSI bus, I then experimented with bulk SCSI transfers from hard disk at various disk interleave factors, obtaining the following surprising results: Hard Disk Time to transfer Interleave 8192 bytes HD->memory ---------- ---------------- 2:1 165ms 3:1 80ms 4:1 95ms 5:1 110ms 6:1 120ms 7:1 140ms 8:1 120ms 9:1 140ms At an interleave of 3:1, the fastest for bulk SCSI transfers, the hard disk supports a burst transfer rate of 5.25us per byte = 190.4K bytes/sec to the Ampro host, and a sustained data transfer rate of 102.4K bytes/sec, not bad for a lowly Z-80. Note: The previous and new interleave factors of 2:1 and 3:1, respectively, have virtually identical throughput for 512-byte BIOS transfers to and from disk. However, for multi-block transfers like the ones I intend to use for tape backups, an interleave of 3:1 produces a huge (i.e., >double) increase in disk throughput compared to an interleave factor of 2:1. With the hard disk formatted with interleave factor 3:1 and with burst mode data transfers in effect to both the hard disk and the tape drive, I then experimented with various tape drive interleave factors. The result is that I now can keep the tape drive streaming at a tape interleave factor of 4:1, which is much better than I had originally hoped. The overall disk to tape throughput increased to 9716 bytes/sec in this configuration. HD interleave: 3:1 HD transfer mode: burst HD transfer size: 4K x 2 Tape interleave: 4:1 Tape transfer mode: burst Tape transfer size: 8K x 1 Net throughput: 9716 Reading data from the hard disk in two 4K byte chunks takes about 80ms. A scope trace of SCSI bus activity indicated that a disk rotation was being lost between reading sequential 4K chunks, even when the two chunks were (logically) adjacent to one another on the same disk track, as is usually the case in large sequential files. When I repeated the experiments reading 8K from the disk in one SCSI request, the time required to fill the memory buffer from the disk dropped to around 60ms. In this configuration, the tape remained streaming at a tape interleave of 3:1, with overall throughput from the disk to the tape increasing to 12787 bytes/sec. HD interleave: 3:1 HD transfer mode: burst HD transfer size: 8K x 1 Tape interleave: 3:1 Tape transfer mode: burst Tape transfer size: 8K x 1 Net throughput: 12787 Kbytes/sec Getting writes to work to the tape was quite an adventure. The same trick that worked effectively for reads from the tape, namely setting the burst mode for 256-byte transfers, caused writes to the tape to hang in mid SCSI phase. The curious thing was that the multi-block writes worked fine when I stepped through them under manual control in the ZSID debugger, but hung when running normally. Figuring there was some race condition between the disk reads and the tape writes, I fiddled around with delays everywhere to no avail. Because the multi-block transfers worked OK with byte-by-byte handshaking, I finally concluded that 256 must be the wrong number of data bytes to transfer to the tape controller in a burst during the SCSI data-out phase. But what was the right number? I set the burst mode to 16 bytes per burst, which cut the byte-by-byte overhead by a factor of 16. This worked fine, allowing writes to the tape to stream at a tape interleave factor of 3:1, the same as for reads. Note: I still cannot explain why write transfers to the tape drive hang with 256 byte bursts and not with 16 byte bursts. Reads and writes both transfer 8192 bytes from or to the tape controller. This should loop the OTIR instruction exactly 32 times for 256 byte bursts and exactly 512 times for 16-byte bursts. Moreover, the transfer rate in either case is only one third of the tape drive controller's 500Kb/sec rated SCSI burst throughput. Maybe the discrepancy in the number of bytes transfered is on a 16-byte boundary, but I find this hard to believe. My 16-byte burst solution works, but maybe I'll just RTFM one more time...) None of my experiments thus far involved frequent head seeks on the hard disk, which are bound to add some overhead to the tape transfers, and could cause loss of streaming. To allow some overhead for head seeks, and still keep the tape streaming, I relaxed the tape interleave factor from 3:1 to 4:1. All in all, I'm quite happy with the results. I know that I can do 12.7K bytes/sec at 3:1 tape interleave, and nearly 10K bytes/sec at 4:1 tape interleave. Depending on the tape interleave I finally settle on, I have either tripled or quadrupled the overall disk-to-tape throughput compared to where I started, and learned a little about my disk drive, my tape drive and the SCSI protocol in the process. Now it's on to building a primitive file system to manage my backups on the cartridge tape. Since I envision the tape as just an archive of large backups (.LBR or tar files), without alot of random access going on, I'm inclined toward using a simple directory structure similar to the one for Novosielski .LBR files, but based on SCSI addressing instead of CP/M tracks and sectors. I'm flexible though, and I'd welcome any suggestions anyone might have regarding a file system for the cartridge tape. Lastly, a small personal note: Over the years I've had to put up with no end of criticism from associates regarding my ongoing interest in Z80 computers. Still, I'm continually amazed at my ability to continually push the envelope of this friendly little OS and CPU. One of my other hobbies is sailing. I get endless pleasure from trimming the sails, reading the wind, pushing the last 1% out of the system. I get the same feeling when talking to one of those so-called DOS "power users" as I do when some muscle boat goes tearing past me on the water. I remark to myself "very impressive - but what do you do after the first 10 minutes when the novelty's worn off?" Thanks again for all the help. It's nice to know there is still a group that shares some of my opinions. Perhaps I can return the favor one day. Bill Stubblebine Hewlett-Packard Logic Systems Div. Colorado Springs, CO was@hp-lsd.hp.com (Internet) (719) 590-5568