Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ll-xn!ames!amdahl!nsc!grenley
From: grenley@nsc.nsc.com (George Grenley)
Newsgroups: comp.arch,comp.sys.nsc.32k
Subject: Re: Performance of the 532
Message-ID: <4299@nsc.nsc.com>
Date: Fri, 8-May-87 14:57:41 EDT
Article-I.D.: nsc.4299
Posted: Fri May  8 14:57:41 1987
Date-Received: Sat, 9-May-87 20:45:28 EDT
References: <324@dumbo.UUCP> <809@killer.UUCP> <2417@homxa.UUCP> <4294@nsc.nsc.com> <374@winchester.UUCP>
Reply-To: grenley@nsc.UUCP (George Grenley)
Organization: National Semiconductor, Sunnyvale
Lines: 112
Keywords: Silicon Indy
Xref: mnetor comp.arch:1221 comp.sys.nsc.32k:141

My 532 performance posting has generated some response.  Good!  Herewith,
more details:  For those of you who wish to run the grep benchmark yourself,
the test was to search for the string "int" in file chroot.c in Unix V source.
I realize this means you nedd the source....

In article <374@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>In article <4294@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>> [deleted lengthy reference to 32532 performance - see previous posting]
>Could you say a little more on the configurations:
>	cache size, nature [write-back or write-thru]
>	if write-thru, did you use write buffers, and if so, how deep.
>	exactly what the assumptions were on the VME memories

Okay, good questions.  Here are some answers:

The 532 itself has a 512 byte (16 byte line size) instruction cache, and
a 2 way set associative data cache of 1024 bytes - see our published 
advance data sheet for details.  I'm sure our NSC sales people would be
happy to hear from you...

One purpose of building the simulator was to evaluate EXTERNAL cache designs.
Because of real-estate restrictions, we settled on a direct-mapped cache of
64 Kbytes.  Both internal and external caches are write through.

We use a write buffer of depth 8.  Our analysis shows the possibility of
this filling w/ typical VME memories is < 1%.

For VME memory performance, we used published DRAM specs from several vendors
such as Clearpoint, Microproject, etc.  We assumed high availability of
the memories (and VMEbus) - i.e., no heavy DMA or multiprocessing.

>It would also be interesting [although I realize this might be
>sensitive info] to get more info on the simulations, to be able to
>make a read on the accuracy of the simulations:
>
>	instruction cycles
>	TLB-miss cycles
>	cache-miss cycles
>	[if present] write-buffer stall & write/read interlock cycles

You are right on two counts:  It would be interesting, and it is also
sensitive.  Such data would give our competitors too good an idea how
good our cach(es) really are, so I shall refrain from any detailed 
discussion of this for awhile.  

I will say this:  I am a hardware designer, not a CPU architect or a
software guy.  With the rapid rise in CPU clock rate (30 meg for the
'532) and memory demands (2 clock cycle = 66 nsec GROSS), internal
caches are becoming a necessity.  I would not want to have to design
the cache for the 532 if it did not already have one - it would be
expensive to do without seriously compromising performance.

But, just to tease people a bit, the combined internal and external
caches used in the above simulations have an overall read hit rate of
better than 93%.  As a result, the system is relatively insensitive
to main memory performance.

[deleted, req to the world to get together for a lil old-fashioned horse-race]

>I think that's a great idea and am delighted that somebody has suggested it.
>Presumably there will be 68030s benchmarkable in hardware by then,
>and certainly 386s, Clippers, and WE32200s.  As a first suggestion,
>I'd observe that there are at least the following classes of realistic
>benchmarks:
>	1) Large FORTRAN / C floating-point ones [and there are many of these
>	that are widely available].  One probably needs at least 5-10 of these
>	to cover the different sorts of things that people do.

My understanding is that the LinPak benchmark has become the standard for
the number crunching guys.  We used it at Jetas Technology when I worked
there as our standard of reference for numeric performance.  It seems
to well-represent the kind of array-oriented math typical of FORTRAN.
(good old fortran - only HLL I was ever really good at 8-))

>	2) Large integer benchmarks: this is the real tough category:
>	most of the larger, realistic ones tend to be proprietary codes,
>	or else things where the code [like for assemblers, compilers, etc]
>	inherently differs among systems.  this also needs 5-10 of them,
>	and could at least include a few of the larger UNIX utilities,
>	although most of them fit into reasonable-sized caches, and hence
>	don't stress things the way larger applications do.

How about, instead, compiles?  They are usually CPU intense (unless you 
have a REALLY terrible disk system, and reflect the general non-numeric
work-load typical of most cpus.  I recall reading analyses of instruction
mixes from many different non-numeric applications that show they don't
vary much.  Since compiles of Unix source files is a "portable test" it
might be suitable...

>	3) Multi-user and/or systems benchmarks, using UNIX.  Run shell
>	scripts, etc.  I'dthink there should at least be a few of these.

>One might want to focus on 1&2, if only to avoid the arguments on 3
>regarding different peripheral choices, operating system tuning, etc,
>unless the shootout is intended as an OS shootout also.

Personally, I see item 3 as being at least as important as 1 and 2,
from a practical point of view.  Overall system performance is 
ultimately the only thing that matters.  Whether system A is faster
than system B because the OS is a better port, or the compiler
optimizes better, or the I/O subsystem is really hot, is of interest
to us as system designers, only insofar as it helps us design better
systems.  This is the primary reason why I'm talking about the 532,
to help other hardware hacks do a good job designing it in.  In
my experience all CPUs are the same speed when they're sitting on
your desk... 8-)

All for now.

Regards,
George Grenley
usual disclaimer - either I'm lying, or not...