Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!bloom-beacon!usc!elroy!ames!ames.arc.nasa.gov!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: DMA on RISC-based systems Message-ID: <26855@ames.arc.nasa.gov> Date: 12 Jun 89 14:18:24 GMT References: <26636@ames.arc.nasa.gov> <8327@killer.DALLAS.TX.US> Sender: usenet@ames.arc.nasa.gov Organization: NASA - Ames Research Center Lines: 110 In article <8327@killer.DALLAS.TX.US> elg@killer.DALLAS.TX.US (Eric Green) writes: >in article <26636@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) says: >> mainframes, I have seen single applications which *averaged* 3 MB/sec on >> 4.5 MB/sec channels on 8 simultaneous data streams. >Which particular mainframes? Sounds like something a Cray could do... This exact performance figure is from a Cyber 205, but I have seen similar performance on Crays (not quite as good *then*, but should be better now because of faster disks - newer disks run at ~100 Mbits/sec transfer rate as opposed to the older 36 Mbits/sec disks) Also, I expect large IBM mainframes to do almost as well. Although the disk transfer rate is not as high, the disk controller to channel connection runs at 4.5 MBytes/sec on some models. (*Aside*) These I/O rates are not particularly high by mainframe standards, just by Mini/Micro standards. There used to be a rule of thumb that for balance a system should have a constant ratio of 1 MIPS/1 MByte/1 Mbyte/sec I/O. The latter was slightly nebulous, but usually interpreted as channels capable of it and disks capable of reading at that rate sustained. It was also considered a "good idea" if disk and channel utilization was less than 5% of raw aggregate capacity in order to guarantee that the disk subsystem was not the bottleneck. I actually did a study once and found that the ratio on one heavily used (i.e. many users) system here actually used 15KB/sec/MIP *average*. This (mainframe) system was capable of at least .5 MB/sec/MIP I/O. This 3% utilization helped make the CPU the bottleneck. Disk I/O is the usual bottleneck on mini/micro systems. This is not necessarily a "problem", it is just a system design and configuration tradeoff. (*end Aside*) On a Cray, if you have an SSD, your I/O rate can run a *lot* faster than the above disk rates. >very little overhead there at all (don't have to cope with memory >protection, can DMA straight into the user's data space without Yes, this is part of the reason such rates can be sustained. These rates were always with data copied directly into user memory. I note that there is a way to do this some Unix systems: a facility to map virtual memory to files. Then "paging" can potentially move the data directly into memory without copying. This is the case where virtual memory actually helps. Most of the time it doesn't matter one way or the other for this problem. >worrying about how "real" memory maps into the user's "virtual" >memory, etc.). Anyway, the Cyber 205 is a virtual machine. VM has nothing to do with it specifically. The cost of copying large blocks of data is much less on a Cray or Cyber 205/ETA machine because block data copies are done at vector rate, and there is enough memory bandwidth available to sustain such rates. Crays have memory protection, and the Operating System still has to figure out what real memory addresses user memory buffers are in. It takes a few microseconds to do this either way, virtual or not. These operations were actually faster on the Cyber 205 than on the Cray X-MP/48, for various reasons. The cost of an I/O operation has generally been in figuring out where the data is in disk and initiating and sustaining the transfer. The Cyber 205 did this quickly because the hardware had *very* capable controllers which did all the cylinder/track/sector mapping, and presented a simple blockserver interface to the operating system. (The 205 did not have the complicated "channel program" problem that IBM's have because this overhead was all done in the controllers.) >Sounds to me like another speed reason for Crays to not have virtual >memory :-) (for the old veterans of past comp.arch discussions). Have It sounds like a reason for systems to support fast I/O to me :-) 1) parallel I/O paths to memory (aka "channels") 2) fast disks 3) low overhead to do a raw disk operation 4) lots of memory bandwidth 5) operating systems which support multiple asynchronous I/O requests 6) operating systems which support transfer of data directly into user memory without being buffered elsewhere ********************************************************************** I have an actual number to present here: I have seen a significant number of applications which can only do about 20 floating point operations per word of I/O, unless the entire problem can be memory contained. The memory required for the entire problem is in the range of 1 Million Words for every 1 to 10 MFLOPS. So, a single job running at ~100 MFLOPS may need about 800 MBytes, *or* the ability to do I/O at a rate of 40 MBytes/sec. The single job referred to earlier was running at about 200 MFLOPS on a Cyber 205 and needed about 50 Mbytes/sec of I/O (it didn't get it - it only got ~24 MBytes/sec) I do not remember exactly how much memory was needed, but it was significantly more than 32 MW (256 MBytes). You have to look at the requirements of the entire problem before you can say what your system requirements are. ********************************************************************** Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117