Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!dg!rec From: rec@dg.dg.com (Robert Cousins) Newsgroups: comp.arch Subject: Re: DMA on RISC-based systems Message-ID: <182@dg.dg.com> Date: 31 May 89 13:01:21 GMT References: <46500067@uxe.cso.uiuc.edu> <1989May26.170247.1165@utzoo.uucp> <1552@softway.oz> Reply-To: rec@dg.UUCP (Robert Cousins) Organization: Data General, Westboro, MA. Lines: 92 In article <1552@softway.oz> chris@softway.oz (Chris Maltby) writes: >In article <1989May26.170247.1165@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >> Having the CPU do the copying is not an obviously *un*reasonable idea. >> Much depends on the details. >> DMA historically was more popular than auxiliary memory because memory was >> expensive. This is no longer true. > >Of course, there are many benefits that can be gained by having controllers >with their own buffers. Disk drivers can stop worrying about rotational >placement if the disk controller is providing whole tracks or cylinders >at a time for no extra bus overhead. LAN drivers can avoid copying stuff >like protocol headers etc into and out of main memory. The same or similar tricks can generally be played using DMA. However, there are certain penalties payed for using buffers: 1. Additional latency -- effectively, disk or LAN devices perform DMA operations into their own buffers. After this, the CPU must perform a copy into main memory. Since these peripheral buffers are not cached (or if they are, then the there is no excuse for not copying into main memory to begin with), the copy will be more expensive. THere are already several versions of Unix which directly page programs from disk to user code space. The use of a dedicated buffer will substantially slow this down. Future versions of Unix may choose to take advantage of these features in greater ways for performance enhancements. The bottom line is that this approach requires an additional copy which can make CPU latency a problem. 2. Buffer size -- provision of a private buffer for a peripheral implies that the driver must now manage the buffer memory. Since certain classes of peripherals such as Ethernet can have semi-continuous traffic, this management must be timely and efficient. The CPU must be able to drain the buffer in a short period of time (which can be a problem under standard Unix due to the design of the dispatcher). The easiest way to handle this is to provide a LARGE buffer to store the data. So, at this point in time, one must ask oneself: "Would I rather have 4 megabytes of dedicated LAN buffer or 4 megabytes of additional main memory?" Most people would rather have the main memory. 3. Architectural generality -- There are a variety of cases where having the data "beamed down" into main memory is useful though strictly not required. In tightly coupled multiprocessors (TCMPs) it is convenient to avoid excessive data movement and to simplify the driver to minimize the time in which a particular device's code is single threaded. The real reason why some machines avoid DMA is because of CPU braindamage. Many CPUs are either poorly cached (causing them to demand too much bus bandwidth and therefore suffering from major performance loss when minor peripherals begin to take bus cycles) or have defective architectures which do not support cache coherency (or atleast support it effectively). Some examples of the first include some of the low end microprocessors which can take 100% of the CPU bandwidth for extended periods of time. Some examples of the second include some of the higher end microprocessors with on-chip caches or cache controllers. A number of DMA buffer workarounds have been used over the years. One favorite hack is to provide a hole in the cache coverage so that some areas of memory are not cached. In one form or another almost every system provides for this. Sometimes it is on a page by page basis (88K for example). Others create a dedicated area of memory for it (MIPs). >Generally, the CPU can be a lot smarter about I/O than any brain-damaged >microprocessor controlled device interface. However, just remember that you are throwing MIPS away doing the copying. I would rather have a $5 DMA controller spending the time than my high powered CPU. Sure, it works to use the CPU to do the copying, but when you realize the amount of time the CPU may be forced to spend because of the copy (including extra interrupt service, context switches, polling loops, cache flushes, etc.), it often turns out that a DMA controller can provide the user with VERY CHEAP MIPS by freeing up the CPU. It is this logic which allows people to avoid using graphics processors in workstations by saying "the CPU is fast therefore I don't need one." >-- >Chris Maltby - Softway Pty Ltd (chris@softway.sw.oz) > >PHONE: +61-2-698-2322 UUCP: uunet!softway.sw.oz.au!chris >FAX: +61-2-699-9174 INTERNET: chris@softway.sw.oz.au Robert Cousins Dept. Mgr, Workstation Dev't. Data General Corp. Speaking for myself alone.