Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!dg!rec
From: rec@dg.dg.com (Robert Cousins)
Newsgroups: comp.arch
Subject: Re: DMA on RISC-based systems
Message-ID: <182@dg.dg.com>
Date: 31 May 89 13:01:21 GMT
References: <46500067@uxe.cso.uiuc.edu> <1989May26.170247.1165@utzoo.uucp> <1552@softway.oz>
Reply-To: rec@dg.UUCP (Robert Cousins)
Organization: Data General, Westboro, MA.
Lines: 92

In article <1552@softway.oz> chris@softway.oz (Chris Maltby) writes:
>In article <1989May26.170247.1165@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>> Having the CPU do the copying is not an obviously *un*reasonable idea.
>> Much depends on the details.
>> DMA historically was more popular than auxiliary memory because memory was
>> expensive.  This is no longer true.
>
>Of course, there are many benefits that can be gained by having controllers
>with their own buffers. Disk drivers can stop worrying about rotational
>placement if the disk controller is providing whole tracks or cylinders
>at a time for no extra bus overhead. LAN drivers can avoid copying stuff
>like protocol headers etc into and out of main memory.

The same or similar tricks can generally be played using DMA.  However,
there are certain penalties payed for using buffers:

	
1.	Additional latency -- effectively, disk or LAN devices perform
	DMA operations into their own buffers.  After this, the CPU must
	perform a copy into main memory.  Since these peripheral buffers
	are not cached (or if they are, then the there is no excuse
	for not copying into main memory to begin with), the copy will
	be more expensive.  THere are already several versions of Unix 
	which directly page programs from disk to user code space.  The
	use of a dedicated buffer will substantially slow this down.  
	Future versions of Unix may choose to take advantage of these
	features in greater ways for performance enhancements.  The bottom 
	line is that this approach requires an additional copy which 
	can make CPU latency a problem.

2.	Buffer size -- provision of a private buffer for a peripheral
	implies that the driver must now manage the buffer memory.
	Since certain classes of peripherals such as Ethernet can have
	semi-continuous traffic, this management must be timely and
	efficient.  The CPU must be able to drain the buffer in a
	short period of time (which can be a problem under standard
	Unix due to the design of the dispatcher).  The easiest way
	to handle this is to provide a LARGE buffer to store the data.
	So, at this point in time, one must ask oneself:  "Would I rather
	have 4 megabytes of dedicated LAN buffer or 4 megabytes of additional
	main memory?"  Most people would rather have the main memory.

3.	Architectural generality -- There are a variety of cases where
	having the data "beamed down" into main memory is useful though
	strictly not required.  In tightly coupled multiprocessors (TCMPs)
	it is convenient to avoid excessive data movement and to simplify
	the driver to minimize the time in which a particular device's
	code is single threaded.  

The real reason why some machines avoid DMA is because of CPU braindamage.
Many CPUs are either poorly cached (causing them to demand too much
bus bandwidth and therefore suffering from major performance loss when
minor peripherals begin to take bus cycles) or have defective architectures
which do not support cache coherency (or atleast support it effectively).

Some examples of the first include some of the low end microprocessors
which can take 100% of the CPU bandwidth for extended periods of time.

Some examples of the second include some of the higher end microprocessors
with on-chip caches or cache controllers.  

A number of DMA buffer workarounds have been used over the years.  One
favorite hack is to provide a hole in the cache coverage so that some
areas of memory are not cached.  In one form or another almost every
system provides for this.  Sometimes it is on a page by page basis (88K
for example).  Others create a dedicated area of memory for it (MIPs).

>Generally, the CPU can be a lot smarter about I/O than any brain-damaged
>microprocessor controlled device interface.

However, just remember that you are throwing MIPS away doing the copying.
I would rather have a $5 DMA controller spending the time than my high
powered CPU.  Sure, it works to use the CPU to do the copying, but when
you realize the amount of time the CPU may be forced to spend because of 
the copy (including extra interrupt service, context switches, polling
loops, cache flushes, etc.), it often turns out that a DMA controller
can provide the user with VERY CHEAP MIPS by freeing up the CPU.  It
is this logic which allows people to avoid using graphics processors
in workstations by saying "the CPU is fast therefore I don't need one."

>-- 
>Chris Maltby - Softway Pty Ltd	(chris@softway.sw.oz)
>
>PHONE:	+61-2-698-2322		UUCP:		uunet!softway.sw.oz.au!chris
>FAX:	+61-2-699-9174		INTERNET:	chris@softway.sw.oz.au


Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.