Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!dg!rec
From: rec@dg.dg.com (Robert Cousins)
Newsgroups: comp.arch
Subject: Re: DMA on RISC-based systems
Message-ID: <189@dg.dg.com>
Date: 8 Jun 89 13:29:07 GMT
References: <46500067@uxe.cso.uiuc.edu> <181@dg.dg.com> <1989May31.163057.543@utzoo.uucp> <3480@orca.WV.TEK.COM> <185@dg.dg.com> <620@biar.UUCP> <41042@bbn.COM>
Reply-To: rec@dg.UUCP (Robert Cousins)
Organization: Data General, Westboro, MA.
Lines: 49

In article <41042@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>In article <620@biar.UUCP> jhood@biar.UUCP (John Hood) writes:
>>Also note that with modern operating systems that do buffering or disk
>>caching, there is going to be a bcopy or its moral equivalent in there
>>somewhere.  
>
>1) Is it possible, if not now but possibly in the future, for programmed
>   I/O to _eliminate_ some of the 'bcopy's?

Some already do for paging and program loads.  Already some Unix DBMS 
products bypass the file system and communicate through the raw character
drivers straight to the disks for performance reasons (bypassing the sector
cache).  While I don't know the implementational details of this, I do
know that it has been known to do substantial good for standard DBMS 
jobs.  This is because many character drivers for disks do DMA directly into
user space.

>2) This discussion brings to mind one that went around some time ago, 
>   which was, is it better to supply a bunch of specialized processors
>   (then bitblt's, now including DMA controllers), or a bunch of identical
>   processors connected together?  Theory was, when the bitblt and DMA are
>   done, the other processor(s) can be applied to a compute bound task.
>   It seems to me this might make an interesting product; price/perf
>   range is varied by the number of [identical] processors, and all I/O
>   hardware is very very dumb.

In an earlier life, I headed up a development of just such a machine, 
the CSI-150 which supported up to 32 V30 CPUs, each of which could be
connected to a private SCSI channel and capable of doing about 1.25 mb/s
on each.  It didn't catch on, but boy could it handle some classes of
I/O based jobs!.  Each CPU ran in its own private memory and sent
messages to other CPUs.  The operating system was designed so that the
file systems were locally managed and cached in each CPU so the messages
were higher level requests similar to NFS or RFS today. 

We did have one additional problem: the system supported exactly 1 user
per CPU.  This meant that the CRTs could be driven at 38.4 Kbps all day
long since they effectively had a dedicated CPU to drive them.  There 
were very few CRTs which could keep up with 19.2 Kbps much less 38.4.
We found that at 38.4, most CRTs couldn't manage to send ^S out to
shut off transmittion!

>-Stan

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.