Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!spool.mu.edu!agate!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.unix.questions Subject: Re: Can't cat tape- big blocks? Message-ID: <14585@dog.ee.lbl.gov> Date: 21 Jun 91 23:19:57 GMT References: <803@adpplz.UUCP> <1991Jun14.094822.7029@prl.dec.com> <14433@dog.ee.lbl.gov> <829@adpplz.UUCP> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Distribution: usa Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 139 X-Local-Date: Fri, 21 Jun 91 16:19:57 PDT >>In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com >>(Boyd Roberts) writes: >>>No, never do that. With 9 track tapes you must do I/O that will >>>ensure that the _whole_ tape block will be read. ... >In <14433@dog.ee.lbl.gov> I suggested: >>It seems to me that the tape driver should return an error if you >>ask for 1K and the tape drive reads 10K. In article <829@adpplz.UUCP> martin@adpplz.UUCP (Martin Golding) writes: >How about having the tape driver return the data? This is a great idea ... but it just will not work, not in conventional Unix contexts. >You know, just like the disk driver, and the terminal driver, and the >ethernet driver, and the printer driver. The problem is somewhat different from disks, not generally applicable to terminals, and entirely applicable to Ethernets (for which raw device read() system calls generally do not exist). Printers generally do not return data to the system and are rather irrelevant. >I _never_ told the system what blocksize my files are, Files? Tape blocks are not files (nor are disk blocks); you do not mount the tape as a file system and open, close, read, write files on the `tape file system'. (There *are* some tape devices that can support this; indeed, 9 track tapes, when extended gaps are used, are to some extent `block addressible'. Most 9 track tapes are not written with extended gaps.) >Streamers need fixed _buffering_ independent of block size, Streamers? Who said anything about streamers? >Vast perverse heresy: If you built a streams tape driver, you could >handle multiple volumes and arbitrary kinds of labeling, independently of >your process! just like the hype says. And indeed, if you did this you could exchange tapes with your Unix buddies, and so forth. But then the day comes when someone hands you a `foreign' tape. (ominous background music) Seriously: The interface we are using here is the `raw' device interface. If you talk to a raw disk, the driver forces you to use the disk's block size: reading or writing one byte from /dev/rdk3c will fail. (On some Unix boxes, it fails by destroying most of the sector, rather than returning an error: not pretty.) Nine track tapes have `records'. The records show through on the raw device, because it *is* the raw device. The records have variable sizes, and in fact do change size. In order to copy a 9 track tape you must retain not only the data, but also the block sizes. Foreign machines actually *use* this stuff, for some reason. The Unix raw device semantics, inasmuch as there are any defined semantics at all, are that each read() or write() system call translates to a single device operation. Hence, when you write() 4096 bytes to a raw 9 track tape, the tape driver tells the tape formatter to write one 4096-byte record. Likewise, when you read() 4096 bytes from a raw 9 track tape, the driver tells the formatter to read one 4096-byte record. If the record under the tape drive's read head just happens to be 10240 bytes, rather than 4096 bytes, the formatter will THROW AWAY the `extra' 6144 bytes. It is gone; the driver never sees it. Typically, all the driver sees is a flag bit in the transfer status, `record length short': `I threw away some of your data. Sorry.' Disk drivers do not have this problem, because disk sectors have a fixed size that is known in advance.% [%Ignore those IBM drives behind the curtain!] Of course, the driver could backspace the tape and reissue the read, asking for more data. There are two problems: a) the driver does not know how *much* more data to read; b) the driver does not have a place to put the extra data anyway. You are using the raw interface, not a buffering interface; there is nowhere to stash the leftover data. You can use the block device, and go through the block device buffer system. However, it generally has some particular size it expects, or some particular range of sizes. Typically this is 512 bytes or some multiple thereof, usually up to 8192 bytes, sometimes 16384 bytes; on a few systems, the block device buffers will even handle 65536 bytes. 9 track tape records typically come in 10240 byte or 32768 byte records, and hence often will not fit anyway. The problem could be solved by adding a whole new abstraction (a `tape' interface with large buffers that, on read, may be only partially filled), but Unix systems generally get away without this. Why are tty interfaces different? Well, first, you are not using the raw device (not even in `raw' mode). Ttys are regular enough, and well-enough understood, to slap an abstraction over top of them and ignore the gritty details of which bits are mark and which are space. This *does* sometimes cause problems; there are people who need particular timing sequences of marking and spacing, and there are interfaces that can do it, with Unix boxes that cannot. But it is not often a problem (unlike 9-track tape exchange, where little sanity reigns). (Note that POSIX spent time wranging over the tty interface, even though they started with the System III stuff, which was clearly a better control abstraction than the V7 stuff found in 4.[123]BSD. Even the well-defined ttys are not well-enough defined for some.) How about Ethernets? Well, not many Unix systems let you open /dev/en0 and read() from it. If you could, and if you asked for ten bytes, and 1536 bytes showed up, the driver would have to save them somewhere, because there is no going back. Fortunately, in this case, there is an easy maximum (1536) and the software abstraction involves protocol demultiplexing already, so already the software must read into private buffers, and can make whatever arrangements it likes. If there were a raw Ethernet interface, though, it might well be best if it required 1536-bytes-or-more on each read() system call. Certainly it should be able to tell you whether you lost something. As it is, the only way a tape driver can do this now is to return an error. Most do not even bother: and when you copy your tape with dd if=/dev/rmt8 of=/dev/rmt9 bs=10k but the record size was 32k, you never even know that your copy is useless. Basically, then, you have two choices: a) Throw a lot of code into the kernel to add `cooked tape devices', somewhat like cooked ttys. You will probably have to leave raw tape devices in anyway, for tape exchange purposes. b) Leave the ugly semantics of 9-track tapes exposed through the raw interface, and let those programs that deal with tapes, also deal with the Outside World. For some reason, most people seem to go for choice (b). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov