Path: utzoo!attcan!uunet!husc6!bloom-beacon!bu-cs!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Maximum MIPS for a given memory bandwidth?
Summary: Bandwidth where?
Message-ID: <8304@pur-ee.UUCP>
Date: 13 Jun 88 16:27:13 GMT
References: <6921@cit-vax.Caltech.Edu>
Distribution: na
Organization: Purdue University Engineering Computer Network
Lines: 46

In article <6921@cit-vax.Caltech.Edu>, mangler@cit-vax.Caltech.Edu (Don Speck) writes:
> A while ago, Rick Richardson was looking for a microprocessor
> that could squeeze 4000 Dhrystones out of a 4 MHz 16-bit bus.
...
>   Processor	    avg read	bus	bandwidth	 MB/s:MIPS
> 		     latency   width   at the CPU   MIPS   ratio
> SUN2 (68010)	      400ns	 16	  5 MB/s     0.7     7
> Microvax II	      400ns	 32	 10 MB/s     0.9    11
> VAX-11/750	     ~440ns	 32	  9 MB/s     0.6    15
> VAX-11/780	     ~440ns	 32	 12 MB/s     1.0    12
...
> I'm wondering if there is some formula for the maximum number of
> MIPS that can be extracted from a memory system, based on its
> bandwidth, bus size, and latency, i.e. "with that memory/cache
> system you can't get more than N mips"?  With a large enough table
> of the above type, perhaps one could derive some rules of thumb in
> this direction?

Well, obviously there is such a formula using your definition of
bandwidth... in fact, you effectively used the formula above.  The major
source of inconsistency is in what constitutes a MIP.  Consider:

1. The average number of bits of memory referenced per instruction executed
   (hence also per MIP) depends on the instruction set and its encoding.
   The lower bound is 0 (i.e., processor crunching a microcoded instruction
   within its own registers) and the maximum is large-but-finite.

2. Your "bandwidth at the CPU" measure simply makes the use of CPU-internal
   registers/cache/instruction-decode-logic and the operand precsion of the
   machine all important.

For example, if we assume that, on average, a 32-bit operand will be
loaded/stored from CPU-external memory every 4 instructions and there are
8-bits per instruction, we would find that we need 2MB/s (16 MBits/s) for
one MIP, giving a ratio of 2:1 in your terminology.  Once you've picked your
benchmark (persumably, Dhrystones) and set the precision of the operands,
you're measuring how space-efficiently instructions are encoded and how well
the CPU-internal memory system works -- not really all that interesting,
because the choice of what to call CPU-internal and what to call
CPU-external is completely arbitrary.

If you break-down the bandwidth measure into bandwidths of the component
parts (i.e., on-chip registers, cache, etc.), then you might get some
interesting results...?

					-hankd