Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site kobold.UUCP Path: utzoo!linus!security!genrad!grkermit!masscomp!kobold!tjt From: tjt@kobold.UUCP Newsgroups: net.unix Subject: Re: Semi-conductor Disk for VAX - (nf) Message-ID: <248@kobold.UUCP> Date: Wed, 11-Jan-84 14:50:39 EST Article-I.D.: kobold.248 Posted: Wed Jan 11 14:50:39 1984 Date-Received: Thu, 12-Jan-84 04:17:13 EST References: <2238@fortune.UUCP> Organization: Masscomp, Westford, MA Lines: 45 Rob Warnock points out that many of the CPU chips (the Motorola 68000 in particular) are memory bound. i.e. most CPU cycles are devoted to accessing memory, and the CPU cycle time is as fast as the the cycle time of the fastest 64K rams. This is certainly an important consideration for some systems. I guess the break-even point is where the time lost to switching bus masters is greater than CPU time required to copy data in a loop, exclusive of the time used to actually move data. At the very best, the overhead of *copying* the data is at least 100% since *two* memory cycles are required -- one to read the data from the dual-ported RAM and one to write the data to where you want it. In addition, there is the overhead of keeping count of how many bytes (words) to transfer, incrementing pointers, and looping. This results in about 50% more overhead. On a 68000 you can unroll your loops, or on a 68010 you can make a two-instruction loop using the DBcc instruction. Using the DBcc instruction takes 22 ticks to move 4 bytes of data which would require 8 ticks if you could just write it there in the first place. You can do just as well on a 68000 by unrolling you loop by e.g. a factor of eight. Each move instruction then takes 20 ticks plus 8 ticks to update the counter and 10 ticks for the branch, but these 18 ticks of overhead get distributed over 8 move instructions to give 20 + 18/8 = 22.25 ticks. Using a DBcc instruction would take even less time, but would require more time to set up the counter initially. It should be noted that there are architectural solutions to the problem of a memory-bound CPU that can be taken at the system level. In particular, using a cache between the CPU and memory bus is a well known technique for constructing a memory system whose average speed is nearly as fast as the fastest semiconductor RAM you care to use but whose average cost is only slightly more than cheaper (but slower) RAM. In addition, using a cache will result in fewer memory cycles on the bus so there is less contention between the CPU and a DMA controller. It is also possible to increase memory bandwidth available to the cache by using a larger wordsize on a private memory bus. Once again, it is necesary to consider an entire system and tailor the software and peripherals to a particular CPU design since, as Rob notes: "Things aren't always what they seem". -- Tom Teixeira, Massachusetts Computer Corporation. Westford MA ...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt (617) 692-6200