Xref: utzoo comp.arch:11704 comp.databases:3829 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!apple!amdahl!rtech!news From: news@rtech.rtech.com (USENET News System) Newsgroups: comp.arch,comp.databases Subject: Re: *big iron* Message-ID: <3794@rtech.rtech.com> Date: 10 Oct 89 02:25:21 GMT References: <21962@cup.portal.com> <1989Sep12.031453.22947@wolves.uucp> <22130@cup.portal.com> <1989Sep16.044013.429@wolves.uucp> <3752@rtech.rtech.com> <829@metaphor.Metaphor.COM> Reply-To: pasker@rtech.com (Bob Pasker) Organization: Relational Technology, Inc. Lines: 80 In article <829@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes: >In article <3752@rtech.rtech.com> daveb@rtech.UUCP (Dave Brower) writes: >> ... Some >>airline reservation systems are said to have huge farms of disk where >>only one or two tracks are used on the whole pack to avoid seeks, for >>instance. >With elevator seeking, disk I/O's in the queue are ordered in such a >way to minimize seek latency between I/O operations. A number of techniques which we used on a VAX-based TP exec called the Transaction Management eXecutive-32 (TMX-32) were: - per disk seek ordering - as stated above - which disk seek ordering - with mirrored disks, choose the disk with the heads closest the part of the disk you're gonna read. (sometimes just flip-flopping between the two is enough.) - coalesced transfers - for instance, if you need to read track N, N+3 and N+7 its sometimes faster to read tracks N to N+7 and sort out the transfers in memeory. - single-read-per-spindle-per-transaction - split up heavily accessed files over N spindles, mapping logical record M to disk (M mod N), physical record (N/M), such that on the average only one disk seek needs to be made per transaction (in parallel, of course). This is worthwhile when the transactions are well defined. This task became considerably difficult when DEC introduced the HSC-50 super-smart, caching disk controller for the VAXcluster and the RA-style disks: 1) it was impossible to know the PHISICAL location of a disk block, due to dynamic, transparent bad-block revectoring and lack of on-line information about the disk geometry. We placed the files carfully on the disk so that they started on a cylinder boundary, adjacent to other files, and assumed what they were "one dimensional." 2) Some of the optimizations were done in the HSC itself so we didnt do them on HSC disks. (seek ordering and command ordering) 3) HSC volume shadowing made the optimizations to our home-grown shadowing obsolete. We kept our shadowing to use in non-HSC enviroments, like uVAXes and locally connected disks, and because it was per-file based, not per volume. Using these techniques, I ran the million-customer TP benchmark @76 TPS on a vax 8600 (~4-mips). I dont remember the $/TPS (of course), but it might have been pretty high because there were a LOT of disk drives. We might have eeked out a few more TPS if we had physical control over the placement of the disk blocks, but probably not more than a few. I also felt that I never knew what the disk was 'really doing' because so much was hidden in the HSC; being the computer programmer that I am, I wanted to know where each head was at each milli-second:->. (The 76TPS bottleneck was the mirrored journal disk, which, although it was written sequentially, it was still nescessary to write to it for the close of each transaction. The next step would have been to allow multiple journal files, but since the runner-up was about 30TPS, we never got around to it :->.) As an aside, for you HSC fans building this kind of stuff, it is possible that large write I/Os to an HSC-served disk will be broken up into multiple physical I/O operations to the disk. This means that if you are just checking headers and trailers for transaction checkpoint consistency, you may have bogus stuff in the middle with perfectly valid header and trailer information if the HSC crashed during the I/O. - bob +-------------------------+------------------------------+--------------------+ ! Bob Pasker ! Relational Technology ! ! ! pasker@rtech.com ! 1080 Marina Villiage Parkway ! INGRES/Net ! ! ! Alameda, California 94501 ! ! ! ! (415) 748-2434 ! ! +-------------------------+------------------------------+--------------------+