Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!amdcad!tim
From: tim@amdcad.AMD.COM (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Maximum MIPS for a given memory bandwidth?
Message-ID: <22081@amdcad.AMD.COM>
Date: 16 Jun 88 02:52:42 GMT
References: <6921@cit-vax.Caltech.Edu> <22050@amdcad.AMD.COM> <291@wombat.UUCP> <22063@amdcad.AMD.COM> <6955@cit-vax.Caltech.Edu>
Reply-To: tim@amdcad.UUCP (Tim Olson)
Distribution: na
Organization: Advanced Micro Devices
Lines: 43

In article <6955@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes:
| So the table is amended as follows:
| 
|   Processor	    avg read	bus	bandwidth   VAX  MB/s:MIPS
| 		     latency   width	available  "MIPS"  ratio
| 25 MHz 88000	       45ns?   32+32	185 MB/s?    17     11?
| 16 MHz MIPSco		?      32+32	120 MB/s?    10?    13?
| 40 MHz RPM40	      100ns    32+16	240 MB/s     15     16
| 25 MHz AMD 29000       80ns    32+32	170 MB/s     22      8
						    ^^^^
Well, on Dhrystone 1.1, anyway! ;-) It would probably be more
"reasonable" to reduce this to 17, which is what we see for large UNIX
utilities.

| The AMD 29000 is remarkably bandwidth-efficient, despite using 
| (on average) less than half of the memory cycles available.  
| (Maybe this points out the efficacy of their optimizer).  

That certainly has to be taken into account.

| How much would the 29000 slow down if it had only one 32-bit 
| path to a combined instruction+data cache, i.e.  half as much 
| peak memory bandwidth available? 

I just ran the benchmarks.  Both models are Video-DRAM memory with
4-cycle jumps, loads, and stores, and 1-cycle instruction burst
capability.  The first model has split I/D (i.e. can have an instruction
burst concurrent with a load or store).  The second must drop I-burst
for every load or store, wait for the load or store to complete, then
start up the I-burst again (another 4 cycles).  This simulates
connection to the memory through a single shared I/D bus.

	Model		Dhrystones (1.1)
	Split I/D:	24294
	Shared I/D:	18428

This is a drop in performance of 24%.  Part of this is due to not being
able to execute other instructions concurrently with an in-progress load
or store, because they cannot be fetched simultaneously.  The other part
is due to restarting the I-burst after a random load or store breaks it.

	-- Tim Olson
	Advanced Micro Devices
	(tim@delirun.amd.com)