Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!mailrus!tut.cis.ohio-state.edu!husc6!bbn!oberon!cit-vax!mangler
From: mangler@cit-vax.Caltech.Edu (Don Speck)
Newsgroups: comp.arch
Subject: Re: Maximum MIPS for a given memory bandwidth?
Message-ID: <6955@cit-vax.Caltech.Edu>
Date: 15 Jun 88 09:15:28 GMT
References: <6921@cit-vax.Caltech.Edu> <22050@amdcad.AMD.COM> <291@wombat.UUCP> <22063@amdcad.AMD.COM>
Distribution: na
Organization: California Institute of Technology
Lines: 43

In article <22063@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes:
>				   I think that average bandwidth
> requirements are much more interesting -- it tells more about the cost
> and complexity of a memory design than the peak rating, and seemed to be
> more in line with what the original poster was asking.

Average bandwidth requirements are the interesting thing for shared-memory
multiprocessors, but I was asking about uniprocessors, where all of the
bandwidth is dedicated to one processor and costs the same to provide
whether the processor uses all of it or not.

I consider caches to be part of the memory system, i.e. part of the
von Neumann bottleneck.

Instead of using the ambiguous term "MIPS", I should have said "number
of times the speed of a VAX/780".  Unfortunately it wouldn't fit in the
column headings.  Dhrystones would have been less ambiguous.  I didn't
expect enough accuracy that it would make much difference.

So the table is amended as follows:

  Processor	    avg read	bus	bandwidth   VAX  MB/s:MIPS
		     latency   width	available  "MIPS"  ratio
25 MHz 88000	       45ns?   32+32	185 MB/s?    17     11?
16 MHz MIPSco		?      32+32	120 MB/s?    10?    13?
40 MHz RPM40	      100ns    32+16	240 MB/s     15     16
25 MHz AMD 29000       80ns    32+32	170 MB/s     22      8

The AMD 29000 is remarkably bandwidth-efficient, despite using
(on average) less than half of the memory cycles available.
(Maybe this points out the efficacy of their optimizer).
How much would the 29000 slow down if it had only one 32-bit
path to a combined instruction+data cache, i.e. half as much
peak memory bandwidth available?

I had assumed that efficient use of bandwidth would require a
narrow path to memory (with bit-addressable bit-serial being
the most efficient).  Perhaps this is not necessary.

I still suspect that there's some lower bound on the number of
bytes exchanged with cache/memory to perform the work of a
"mythical" instruction.

Don Speck   speck@vlsi.caltech.edu  {amdahl,ames!elroy}!cit-vax!speck