Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!lll-lcc!pyramid!prls!mips!mash
From: mash@mips.UUCP (John Mashey)
Newsgroups: comp.arch,comp.sys.nsc.32k
Subject: Re: Performance of the 532
Message-ID: <374@winchester.UUCP>
Date: Thu, 7-May-87 15:20:35 EDT
Article-I.D.: winchest.374
Posted: Thu May  7 15:20:35 1987
Date-Received: Sat, 9-May-87 09:39:22 EDT
References: <324@dumbo.UUCP> <809@killer.UUCP> <2417@homxa.UUCP> <4294@nsc.nsc.com>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 56
Xref: mnetor comp.arch:1212 comp.sys.nsc.32k:136

In article <4294@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>
>So, here are some simulated facts:  Our design team has done simulations
>of the chip's performance, both with ideal 0 wait state memory and with
>"real world" typical VME bus memory.  We ran some unix utilities, including
>our own compilers, etc.  I will divulge a few of these numbers now, and more
>later.  (If I don't get burned for this):

>Grep ran at 8.4 mips from 0 ws memory, 7.9 from VME.  Grep was one of the
>best.  One of the worst was our assembler, it hit 5 mips from 0 ws, and
>4.5 mips from VME.  On the average (these two plus several other CPU 
>intensive programs) the '532 hit 6.1 mips from 0 ws, 5.3 mips from VME.

Could you say a little more on the configurations:
	cache size, nature [write-back or write-thru]
	if write-thru, did you use write buffers, and if so, how deep.
	exactly what the assumptions were on the VME memories

It would also be interesting [although I realize this might be
sensitive info] to get more info on the simulations, to be able to
make a read on the accuracy of the simulations:

	instruction cycles
	TLB-miss cycles
	cache-miss cycles
	[if present] write-buffer stall & write/read interlock cycles

>So, here's the deal.  I invite Mot, Intel, and other interested parties
>to work with me in defining some sort of realistic benchmark, which we'll
>run (in public).  I expect to have system level hardware late this year,
>so if we get started now, we'll have very interesting Xmas presents...

I think that's a great idea and am delighted that somebody has suggested it.
Presumably there will be 68030s benchmarkable in hardware by then,
and certainly 386s, Clippers, and WE32200s.  As a first suggestion,
I'd observe that there are at least the following classes of realistic
benchmarks:
	1) Large FORTRAN / C floating-point ones [and there are many of these
	that are widely available].  One probably needs at least 5-10 of these
	to cover the different sorts of things that people do.
	2) Large integer benchmarks: this is the real tough category:
	most of the larger, realistic ones tend to be proprietary codes,
	or else things where the code [like for assemblers, compilers, etc]
	inherently differs among systems.  this also needs 5-10 of them,
	and could at least include a few of the larger UNIX utilities,
	although most of them fit into reasonable-sized caches, and hence
	don't stress things the way larger applications do.
	3) Multi-user and/or systems benchmarks, using UNIX.  Run shell
	scripts, etc.  I'dthink there should at least be a few of these.
One might want to focus on 1&2, if only to avoid the arguments on 3
regarding different peripheral choices, operating system tuning, etc,
unless the shootout is intended as an OS shootout also.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086