Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!think!ames!amdcad!tim From: tim@amdcad.UUCP (Tim Olson) Newsgroups: comp.arch Subject: Re: Am29000 and MIPS Message-ID: <15232@amdcad.UUCP> Date: Thu, 19-Mar-87 18:56:28 EST Article-I.D.: amdcad.15232 Posted: Thu Mar 19 18:56:28 1987 Date-Received: Sat, 21-Mar-87 07:21:55 EST References: <15192@amdcad.UUCP> <1423@husc6.UUCP> <15243@sun.uucp> <17915@ucbvax.BERKELEY.EDU> Distribution: world Organization: AMDCAD, Sunnyvale, CA Lines: 72 Keywords: RISC MIPS Simulation Performance Mike Shebanow writes: | I have several questions regarding the simulation. It would | be quite unfair to compare the running times for a real VAX 11/780 against | a simulation, unless the following effects are included in the simulation: +----- Agreed. You must always take *any* benchmark comparisons with a grain of salt. The numbers posted were only meant to provide a rough idea of the performance attainable. | 1) Were cold start effects included? If so, how are they simulated? +----- Yes. All caches (including the TLB) are initially invalidated. The simulated program exists in memory, but is "faulted in" to the TLB during execution. | 2) How were page faults simulated? Are these times included? +----- Page faults are a secondary effect of TLB misses, and times for them are greatly dependent on disk speed, etc. Page fault processing time is not included in the accumulated user time on real machines, so we don't include it either. We *do* count all of the processing (both user and system) time it takes to perform TLB miss handling and system call entry/exit. | 3) Was logic for the cache and TLB (an assumption) simulated? +----- Yes. All of the caches (both internal and external) were fully simulated with "real" misses, reloading, etc. The models are not derived from statistical averages, and the numbers we used (both size and access time) were taken from what we felt would be feasible both now and in the near future. | 4) Are the effects of I/O simulated (say 10, 25, and 50% bus bandwidth | consumed by I/O devices)? What model? +----- This opens a whole new "can of worms", and is why most benchmarks don't address the I/O issue. I/O effects were not simulated, but, as far as bus bandwidth is concerned, external cache reload is only consuming 20% to 25%, so concurrent DMA into memory should not degrade the performance too much. Note that our large number of registers reduce the number of data cache accesses, which further reduces the reload bandwidth requirement. | 5) How are system times (I assume that UNIX is used) calculated? Is this | work done by the UNIX kernel simulated? +----- If you are asking whether we are simulating an entire UNIX kernel, the answer is no. We are simulating compiled C code supported by a C runtime library. System calls are implemented as traps, some of which are executed directly in 29000 code, others which are I/O dependent (like fopen) are passed to the host system for processing. However, we are not comparing system times, just user times. Granted, there is some interaction; we have tried to stay on the conservative side with our numbers. | I don't mean to put the AM29000 down (as including all of the above into | a simulation is beyond difficult), but using simulation times to compare | performance against a real machine is unreasonable (as is a MIPS | to MIPS comparison). | | Mike Shebanow | shebanow@ji.berkeley.edu +---- You certainly do bring up valid points; we also inform customers about our potential simulation limitations when they run benchmarks on our simulator. We aren't trying to "fool" anyone here, just attempting to provide a realistic assessment of performance until parts are available. -- Tim Olson Advanced Micro Devices