Xref: utzoo comp.unix.aix:733 comp.periphs.scsi:168 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!wonky!mjacob From: mjacob@wonky.Sun.COM (Matt Jacob) Newsgroups: comp.unix.aix,comp.periphs.scsi Subject: Re: SCSI hiding geometry Message-ID: <132963@sun.Eng.Sun.COM> Date: 15 Mar 90 07:06:53 GMT References: <1660@aber-cs.UUCP> <51507@sgi.sgi.com> <132788@sun.Eng.Sun.COM> <1990Mar11.045128.17732@ico.isc.com> Sender: news@sun.Eng.Sun.COM Reply-To: mjacob@sun.UUCP (Matt Jacob) Organization: Sun Microsystems, Mountain View Lines: 105 [ Sorry- my machine was down for a couple of days so I am late in responding to this.. ] >... >> My own personal opinion is that geometry based filesystems are >> getting to be a bad microoptimization... > >But SCSI is not the only interface around, and I think there are some open >questions about how much device-sensitivity you want in the mid level of >the file/disk system. That is, if you've got a more traditional disk >interface (some of which are pretty high performance) you need to deal with >geometry. Do you want to ignore geometry some of the time? It gets harder >and harder to know how/where to make the cut. > >(My own personal opinion, not necessarily well substantiated, is that SCSI >was at best premature, and at worst wrong, in trying to hide drive geometry >from the host system.) > Ah, but SCSI wasn't premature- it was/is an extension of the IBM channel concept to smaller lower-cost machines. Granted, more 'traditional' disk interfaces need and should allow the main CPU to know and take advantage of disk geometry. However, the 256-512kb of code to handle the 4.3 filesytem can be considered *wasted* main CPU cycles if you can offload the processing. >>...With the coming of SCSI-2 >> multiple command targets, it seems to me that one should just >> concentrate on getting requests out to the target as quickly >> as possible and let the microprocessor on the drive figure out >> the best order do them in. > >This raises a sticky issue of who's in control of the disk system. >Consider reliability issues. Two examples come to mind. First, in a UNIX >file system, you probably want to have some control over the order of >operations so that you can have some reasonable assurance that operations >on inodes, indirect blocks, directories, and data happen in a way that will >allow you a good chance for recovery if you crash while there are >operations in the queue. Second, in a database it is essential that you be >able to control the sequencing of operations so that commits really commit, >journaling happens when you expect, etc. There are quite adequate mechanisms in SCSI to handle this (e.g., the *real* use of linked commands, which provide means for specifying atomic operations w.r.t. to multiple sets of i/o from a single initiator). It is true that Unix itself does not provide good hooks for reliability or database sequencing, but to criticize SCSI for allowing you to do things your OS can't handle well to begin with is the tail wagging the dog. > >Frankly, I don't want to trust J Random Microcoder to give a disk-write- >reordering algorithm that won't screw things up. Even if I'm assured of >some sort of "fair" algorithm, trying to sequence things in the kernel to >compensate for all the possible variants of reordering sounds like a pain. >(It's also redundant in a perverse way: You have to write code to un-do >decisions which are going to be made for you that you don't want.) > Now this is a valid point, in a way. I've gone over this issue in several different contexts (having been a microcoder in my dim past). In the case where you have more than one decision maker, *one* must make the choice decisisions as to optimal i/o ordering, etc., else chaos results. In the case of distributed I/O subsystems (SCSI or otherwise), I have found that you *have* to do things like *not* disksort on the stub cpu side of things. If you have the BSD filesystem, you *must* specify things like 0 rotational delay, etc., in order to *not* have the filesystem and the i/o subsystem cancel each other out. Ideally, one would like a a filesytem to form requests that have precedence, priority, and cache-retention parameters. That is, the filesystem associates with each data it wants transferred loose statements like: "Write this *NOW*" "Write this, and hang on to it, 'coz I'll likely ask for it back soon." "Write this *before* Reading *that*" and so on. I feel that we (as in the Unix commercial marketplace) are very far from that (flame on, everyone!).... >I think it would make the job of kernel folks a lot easier if they could >deal with interfaces which just attempt to be fast in a predictable way, >instead of trying to be smart. For about two years at Sun, I had posted on my office door a one-page printout (well, it was small font) entitled "The Ideal and Perfect Driver". It was for the PDP-11 RK05 removable 2.5mb drive. Also, I have kicking around at home a 200-odd word pdp-11 assembler language rm03 driver I wrote for RT-11. These are *very* simple. Unfortunately, I have not been able to beg, plead, extort, bribe, or otherwise convince hardware engineers to take such simple interfaces and run them up to a decent speed. Ergo, complexity in s/w has been a natural result. -matt