Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site kobold.UUCP
Path: utzoo!linus!security!genrad!grkermit!masscomp!kobold!tjt
From: tjt@kobold.UUCP
Newsgroups: net.unix
Subject: Re: Semi-conductor Disk for VAX - (nf)
Message-ID: <248@kobold.UUCP>
Date: Wed, 11-Jan-84 14:50:39 EST
Article-I.D.: kobold.248
Posted: Wed Jan 11 14:50:39 1984
Date-Received: Thu, 12-Jan-84 04:17:13 EST
References: <2238@fortune.UUCP>
Organization: Masscomp, Westford, MA
Lines: 45

Rob Warnock points out that many of the CPU chips (the Motorola 68000
in particular) are memory bound. i.e. most CPU cycles are devoted to
accessing memory, and the CPU cycle time is as fast as the the cycle
time of the fastest 64K rams.

This is certainly an important consideration for some systems.  I guess
the break-even point is where the time lost to switching bus masters is
greater than CPU time required to copy data in a loop, exclusive of the
time used to actually move data.  At the very best, the overhead of
*copying* the data is at least 100% since *two* memory cycles are
required -- one to read the data from the dual-ported RAM and one to
write the data to where you want it.  In addition, there is the
overhead of keeping count of how many bytes (words) to transfer,
incrementing pointers, and looping.  This results in about 50% more
overhead.

On a 68000 you can unroll your loops, or on a 68010 you can make a
two-instruction loop using the DBcc instruction.  Using the DBcc
instruction takes 22 ticks to move 4 bytes of data which would require
8 ticks if you could just write it there in the first place.  You can
do just as well on a 68000 by unrolling you loop by e.g. a factor of
eight.  Each move instruction then takes 20 ticks plus 8 ticks to
update the counter and 10 ticks for the branch, but these 18 ticks of
overhead get distributed over 8 move instructions to give 20 + 18/8 =
22.25 ticks.  Using a DBcc instruction would take even less time, but
would require more time to set up the counter initially.

It should be noted that there are architectural solutions to the
problem of a memory-bound CPU that can be taken at the system level.
In particular, using a cache between the CPU and memory bus is a well
known technique for constructing a memory system whose average speed is
nearly as fast as the fastest semiconductor RAM you care to use but
whose average cost is only slightly more than cheaper (but slower) RAM.
In addition, using a cache will result in fewer memory cycles on the
bus so there is less contention between the CPU and a DMA controller.
It is also possible to increase memory bandwidth available to the cache
by using a larger wordsize on a private memory bus.

Once again, it is necesary to consider an entire system and tailor the
software and peripherals to a particular CPU design since, as Rob notes:

	"Things aren't always what they seem".
-- 
	Tom Teixeira,  Massachusetts Computer Corporation.  Westford MA
	...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt   (617) 692-6200