Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!amdcad!tim From: tim@amdcad.AMD.COM (Tim Olson) Newsgroups: comp.arch Subject: Re: Maximum MIPS for a given memory bandwidth? Message-ID: <22081@amdcad.AMD.COM> Date: 16 Jun 88 02:52:42 GMT References: <6921@cit-vax.Caltech.Edu> <22050@amdcad.AMD.COM> <291@wombat.UUCP> <22063@amdcad.AMD.COM> <6955@cit-vax.Caltech.Edu> Reply-To: tim@amdcad.UUCP (Tim Olson) Distribution: na Organization: Advanced Micro Devices Lines: 43 In article <6955@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: | So the table is amended as follows: | | Processor avg read bus bandwidth VAX MB/s:MIPS | latency width available "MIPS" ratio | 25 MHz 88000 45ns? 32+32 185 MB/s? 17 11? | 16 MHz MIPSco ? 32+32 120 MB/s? 10? 13? | 40 MHz RPM40 100ns 32+16 240 MB/s 15 16 | 25 MHz AMD 29000 80ns 32+32 170 MB/s 22 8 ^^^^ Well, on Dhrystone 1.1, anyway! ;-) It would probably be more "reasonable" to reduce this to 17, which is what we see for large UNIX utilities. | The AMD 29000 is remarkably bandwidth-efficient, despite using | (on average) less than half of the memory cycles available. | (Maybe this points out the efficacy of their optimizer). That certainly has to be taken into account. | How much would the 29000 slow down if it had only one 32-bit | path to a combined instruction+data cache, i.e. half as much | peak memory bandwidth available? I just ran the benchmarks. Both models are Video-DRAM memory with 4-cycle jumps, loads, and stores, and 1-cycle instruction burst capability. The first model has split I/D (i.e. can have an instruction burst concurrent with a load or store). The second must drop I-burst for every load or store, wait for the load or store to complete, then start up the I-burst again (another 4 cycles). This simulates connection to the memory through a single shared I/D bus. Model Dhrystones (1.1) Split I/D: 24294 Shared I/D: 18428 This is a drop in performance of 24%. Part of this is due to not being able to execute other instructions concurrently with an in-progress load or store, because they cannot be fetched simultaneously. The other part is due to restarting the I-burst after a random load or store breaks it. -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)