Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!sun-barr!texsun!texbell!killer!elg
From: elg@killer.DALLAS.TX.US (Eric Green)
Newsgroups: comp.arch
Subject: Re: DMA on RISC-based systems
Message-ID: <8327@killer.DALLAS.TX.US>
Date: 10 Jun 89 00:18:27 GMT
References: <26636@ames.arc.nasa.gov>
Organization: The Unix(R) Connection, Dallas, Texas
Lines: 62

in article <26636@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) says:
>>I have performed a small test on a DECstation3100 with a RZ55-230 Mb disk.
> 
>>Write 15x2 Mb: 113 s, 265 kb/s
>>Read 15x2 Mb:  117 s, 256 kb/s
>>Read 15x2 + write 15x2 Mb (new and in parallell): 281 s, 213 kb/s
>>Mean value: 234 kb/s

Note that this is probably not an accurate account of disk drive
bandwidth at all. Unix (at least older AT&T versions) DMA their data
into the disk cache, then has the CPU manually copy it into the user's
own buffer. With a plain-jane ST157N and a non-DMA SCSI controller
pushed by a plain old 8mhz 68000, I get 550K/second (at least until my
disk gets fragmented). And there are still visible pauses where the
68000 takes a while to digest the data. Another (DMA) disk controller
gets 650K/second out of the same disk drive (of course, a 68020 or
faster processor wouldn't have run out of steam like my 68000, so this
isn't really an argument of DMA is better than CPU driven).

Strangely enough, I have never seen anything on preferential caching
schemes for file systems. You'd want to cache small I/O requests, as
is currently done... but what about the scientific types who want to
stream in a few megabytes of data, crunch on it, then stream it back
out -- as fast as possible? That'd blow any reasonable cache to
pieces.  You'd want to DMA it straight into the user's memory. Or even
use CPU-driven IO straight into the user's memory... you'd still come
out at least as well as the traditional DMA-it-to-cache-then-copy-it.
Thinking on it a bit, seems you'd want to cache only small I/O
requests that don't overwhelm the amount of cache you have, while
DMA'ing large I/O requests straight into the user's memory ASAP. That
way crontab, whotab, and other small files hit fairly often would stay
cached longer. An interesting problem...

I suppose it irritates the designers of these disk subsystems that all
their beautiful bandwidth is chewed to shreds by OS overhead. 

> On
> mainframes, I have seen single applications which *averaged* 3 MB/sec on
> 4.5 MB/sec channels on 8 simultaneous data streams.

Which particular mainframes? Sounds like something a Cray could do...
very little overhead there at all (don't have to cope with memory
protection, can DMA straight into the user's data space without
worrying about how "real" memory maps into the user's "virtual"
memory, etc.).

Sounds to me like another speed reason for Crays to not have virtual
memory :-) (for the old veterans of past comp.arch discussions). Have
to consider all aspects of the architecture, including disk subsystem
performance, not just what it looks like from a user or CPU point of
view.

> So, the ratios quoted seem reasonable to me.

Yes, seems reasonable to me too. But somewhat sad, considering the
performance that the hardware is capable of.

--
    Eric Lee Green              P.O. Box 92191, Lafayette, LA 70509     
     ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg     (318)989-9849    
"I have seen or heard 'designer of the 68000' attached to so many names that
 I can only guess that the 68000 was produced by Cecil B. DeMille." -- Bcase