Path: utzoo!mnetor!motto!ecijmm!eci386!clewis
From: clewis@eci386.UUCP
Newsgroups: comp.unix.i386
Subject: Re: ESDI controller recommendations
Message-ID: <1989Aug29.230048.19130@eci386.uucp>
Date: 29 Aug 89 23:00:48 GMT
References: <121@mdi386.UUCP> <1474@wb3ffv.ampr.org> <4843@looking.on.ca>
Reply-To: clewis@eci386.UUCP (Chris Lewis)
Organization: R. H. Lathwell Associates: Elegant Communications, Inc.
Lines: 110

In article <4843@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>While cylinder or track caching is an eminently sensible idea that I
>have been waiting to see for a long time, what is the point in the
>controller or drive sotring more than that?
>
>Surely it makes more sense for the OS to do all other cache duties.
>Why put the 512K in your drive when you can put it in your system and
>bump your cache there?   Other than the CPU overhead of maintaining the
>cache within the OS, I mean.  I would assume the benefit from having
>the cache maintained by software that knows a bit about what's going
>on would outweigh this.

I've had quite a bit of exposure to the DPT caching disk controllers
so I'll outline some of the interesting points.  Some of these pertain 
generally to DPT, or only the models I was playing with (ESDI and ST506 
disk interface versions with SCSI host interface), or more generally.  

	1) Write-after caching: Most systems do their swapping and/or paging
	   raw.  Thus they must *wait* for a write operation to complete
	   before reusing the memory.  Eg: avg 28 ms with ST506 drive.  
	   With write-after, you can reuse memory in .5 ms no matter how slow
	   your drive is (unless the cache really fills).

	   I installed one of these suckers on a Tower 600 with 4Mb running
	   Oracle.  We were able to immediately double the number of users
	   using Oracle (from 4 to 8 simultaneous actions with considerably
	   better response for all 8.  Oracle 4.1.4 is a pig!  So was the
	   host adapter at the time - 3-6ms to transfer 512 bytes!).  
	   
	   A look at the controller statistics showed that the system was 
	   swapping like mad, but virtually *no* physical disk I/O's actually 
	   occurred.  Eg: blocks were being read back so fast that the
	   controller never needed to write them out.

	   Of course, this can be similarly done by adding physical memory
	   to the system, however, DPT memory is cheaper than Tower memory...
	2) Host memory limitations - how does 16Mb of main memory almost
	   exclusively for use by programs and 12Mb of buffer cache strike you?
	   (AT-style system limitations)  Otherwise there's lots of tricky 
	   trade-offs.
	
	   On the other hand, when faced with lots of physical memory on the
	   host, it makes far more sense to use it for program memory than a RAM
	   swap disk.
	3) If your kernel panics, the controller gets a chance to flush
	   its buffers - handy particularly if you make the kernel buffers
	   small.  Was sort of scary to see, for the first time, a Tower 
	   400 woof its cookies (so I'm not a perfect device driver writer ;-)
	   and see the disk stay active for another 30 seconds...
	4) If you have a power failure, having the cache on the controller
	   is a bad idea, because the kernel does make some assumptions
	   about the order in which I/O occurs.  With the models I was
	   using it made economic sense to place a UPS only on the controller
	   and disk subsystem.  I don't know whether this is possible on the
	   AT versions, but on the AT versions it's cheaper to get a 
	   whole-system UPS.
	5) DPT read-ahead can be cancelled by subsequent read requests.
	6) The DPT's algorithms (eg: replacement policy, lock regions, 
	   write-after delay times, dirty buffer high-water, cache
	   allocation amongst multiple drives etc.) can be tuned.  Most 
	   kernels cannot be much.
	7) Now we get into the hazy stuff - I'm convinced from the testing 
	   that I did with the DPT lashups I built, plus experience
	   inside other kernels, that the DPT has far better caching than
	   most UNIX kernels.

	   Generally speaking, except for look-ahead (which the DPT supports 
	   as well) kernel take no special knowledge of the disk *other* than 
	   inherent efficiency of file system layout (eg: Fast File System 
	   structures) and free list sorting (dump/mkfs/restore anyone?).

	   For example, except for free-list sorting and other mkfs-style
	   tuning, fio.c and bio.c (file I/O and block I/O portions of 
	   the kernel) don't know diddly squat about the real disk.  
	   Whereas, the DPT knows it intimately - sectors per track, 
	   rotational latency etc.

	   The DPT uses the elevator algorithm and apparently a better 
	   LRU (page replacement) algorithm, has sector and cylinder 
	   skewing and so on.

Unfortunately, I no longer have a copy of the report.  Further, most of the
measurements I was making was with reasonably representative technical
measures of performance, but don't give an overall feel for performance.
However, one that I remember may be of interest - kernel relinks on the
Tower usually took close to 3 minutes.  With the DPT, it went to little over
2 minutes.  Big hairy deal...  However, further examination of "time" results
showed that the I/O component *completely* disappeared.  Like wow.
Some other simple benchmarks showed overall performance increases of up
to a factor of 15!

The only way to make the DPT system work better would be to make some major
deals with fio.c/bio.c and a couple of minor mods to the DPT.  For example, 
multiple lower priority look ahead threads based upon file block ordering.  
Explicitly cancellable I/O's or look aheads.  More, but I forget now.

The DPT also has some other niceties: automatic bad-block sparing, single
command format/bad blocking, statistics retrieval, and in my case, 
compatibility with dumb SCSI controllers except for the additional 
features - the NCR Tower SCSI driver has this neat "issue
this chunk of memory as a SCSI command and give me the result" ioctl.

Neat stuff the DPT.

[No, I don't work for, nor have I ever worked for DPT.  Hi Tom!]
-- 
Chris Lewis, R.H. Lathwell & Associates: Elegant Communications Inc.
UUCP: {uunet!mnetor, utcsri!utzoo}!lsuc!eci386!clewis
Phone: (416)-595-5425