Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!think!ames!amdcad!tim
From: tim@amdcad.UUCP (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Am29000 and MIPS
Message-ID: <15232@amdcad.UUCP>
Date: Thu, 19-Mar-87 18:56:28 EST
Article-I.D.: amdcad.15232
Posted: Thu Mar 19 18:56:28 1987
Date-Received: Sat, 21-Mar-87 07:21:55 EST
References: <15192@amdcad.UUCP> <1423@husc6.UUCP> <15243@sun.uucp> <17915@ucbvax.BERKELEY.EDU>
Distribution: world
Organization: AMDCAD, Sunnyvale, CA
Lines: 72
Keywords: RISC MIPS Simulation Performance

Mike Shebanow writes:

| I have several questions regarding the simulation. It would
| be quite unfair to compare the running times for a real VAX 11/780 against
| a simulation, unless the following effects are included in the simulation:
+-----
Agreed. You must always take *any* benchmark comparisons with a grain of
salt. The numbers posted were only meant to provide a rough idea of the
performance attainable.

| 1) Were cold start effects included? If so, how are they simulated?
+-----
Yes.  All caches (including the TLB) are initially invalidated.  The
simulated program exists in memory, but is "faulted in" to the TLB
during execution.
 
| 2) How were page faults simulated?  Are these times included?
+-----
Page faults are a secondary effect of TLB misses, and times for them are
greatly dependent on disk speed, etc.  Page fault processing time is not
included in the accumulated user time on real machines, so we don't
include it either.  We *do* count all of the processing (both user and
system) time it takes to perform TLB miss handling and system call
entry/exit.

| 3) Was logic for the cache and TLB (an assumption) simulated?
+-----
Yes.  All of the caches (both internal and external) were fully
simulated with "real" misses, reloading, etc.  The models are not
derived from statistical averages, and the numbers we used (both size
and access time) were taken from what we felt would be feasible both now
and in the near future.

| 4) Are the effects of I/O simulated (say 10, 25, and 50% bus bandwidth
| consumed by I/O devices)? What model?
+-----
This opens a whole new "can of worms", and is why most benchmarks don't
address the I/O issue.  I/O effects were not simulated, but, as far as
bus bandwidth is concerned, external cache reload is only consuming 20%
to 25%, so concurrent DMA into memory should not degrade the
performance too much.  Note that our large number of registers reduce
the number of data cache accesses, which further reduces the reload
bandwidth requirement.

| 5) How are system times (I assume that UNIX is used) calculated? Is this
| work done by the UNIX kernel simulated?
+-----
If you are asking whether we are simulating an entire UNIX kernel, the
answer is no.  We are simulating compiled C code supported by a C
runtime library.  System calls are implemented as traps, some of which
are executed directly in 29000 code, others which are I/O dependent
(like fopen) are passed to the host system for processing.  However, we
are not comparing system times, just user times.  Granted, there is some
interaction; we have tried to stay on the conservative side with our
numbers.


| I don't mean to put the AM29000 down (as including all of the above into
| a simulation is beyond difficult), but using simulation times to compare
| performance against a real machine is unreasonable (as is a MIPS
| to MIPS comparison).
| 
| Mike Shebanow
| shebanow@ji.berkeley.edu
+----
You certainly do bring up valid points; we also inform customers about
our potential simulation limitations when they run benchmarks on our
simulator. We aren't trying to "fool" anyone here, just attempting to
provide a realistic assessment of performance until parts are available.

	-- Tim Olson
	Advanced Micro Devices