Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ll-xn!ames!amdahl!nsc!grenley From: grenley@nsc.nsc.com (George Grenley) Newsgroups: comp.arch,comp.sys.nsc.32k Subject: Re: Performance of the 532 Message-ID: <4299@nsc.nsc.com> Date: Fri, 8-May-87 14:57:41 EDT Article-I.D.: nsc.4299 Posted: Fri May 8 14:57:41 1987 Date-Received: Sat, 9-May-87 20:45:28 EDT References: <324@dumbo.UUCP> <809@killer.UUCP> <2417@homxa.UUCP> <4294@nsc.nsc.com> <374@winchester.UUCP> Reply-To: grenley@nsc.UUCP (George Grenley) Organization: National Semiconductor, Sunnyvale Lines: 112 Keywords: Silicon Indy Xref: mnetor comp.arch:1221 comp.sys.nsc.32k:141 My 532 performance posting has generated some response. Good! Herewith, more details: For those of you who wish to run the grep benchmark yourself, the test was to search for the string "int" in file chroot.c in Unix V source. I realize this means you nedd the source.... In article <374@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >In article <4294@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes: >> [deleted lengthy reference to 32532 performance - see previous posting] >Could you say a little more on the configurations: > cache size, nature [write-back or write-thru] > if write-thru, did you use write buffers, and if so, how deep. > exactly what the assumptions were on the VME memories Okay, good questions. Here are some answers: The 532 itself has a 512 byte (16 byte line size) instruction cache, and a 2 way set associative data cache of 1024 bytes - see our published advance data sheet for details. I'm sure our NSC sales people would be happy to hear from you... One purpose of building the simulator was to evaluate EXTERNAL cache designs. Because of real-estate restrictions, we settled on a direct-mapped cache of 64 Kbytes. Both internal and external caches are write through. We use a write buffer of depth 8. Our analysis shows the possibility of this filling w/ typical VME memories is < 1%. For VME memory performance, we used published DRAM specs from several vendors such as Clearpoint, Microproject, etc. We assumed high availability of the memories (and VMEbus) - i.e., no heavy DMA or multiprocessing. >It would also be interesting [although I realize this might be >sensitive info] to get more info on the simulations, to be able to >make a read on the accuracy of the simulations: > > instruction cycles > TLB-miss cycles > cache-miss cycles > [if present] write-buffer stall & write/read interlock cycles You are right on two counts: It would be interesting, and it is also sensitive. Such data would give our competitors too good an idea how good our cach(es) really are, so I shall refrain from any detailed discussion of this for awhile. I will say this: I am a hardware designer, not a CPU architect or a software guy. With the rapid rise in CPU clock rate (30 meg for the '532) and memory demands (2 clock cycle = 66 nsec GROSS), internal caches are becoming a necessity. I would not want to have to design the cache for the 532 if it did not already have one - it would be expensive to do without seriously compromising performance. But, just to tease people a bit, the combined internal and external caches used in the above simulations have an overall read hit rate of better than 93%. As a result, the system is relatively insensitive to main memory performance. [deleted, req to the world to get together for a lil old-fashioned horse-race] >I think that's a great idea and am delighted that somebody has suggested it. >Presumably there will be 68030s benchmarkable in hardware by then, >and certainly 386s, Clippers, and WE32200s. As a first suggestion, >I'd observe that there are at least the following classes of realistic >benchmarks: > 1) Large FORTRAN / C floating-point ones [and there are many of these > that are widely available]. One probably needs at least 5-10 of these > to cover the different sorts of things that people do. My understanding is that the LinPak benchmark has become the standard for the number crunching guys. We used it at Jetas Technology when I worked there as our standard of reference for numeric performance. It seems to well-represent the kind of array-oriented math typical of FORTRAN. (good old fortran - only HLL I was ever really good at 8-)) > 2) Large integer benchmarks: this is the real tough category: > most of the larger, realistic ones tend to be proprietary codes, > or else things where the code [like for assemblers, compilers, etc] > inherently differs among systems. this also needs 5-10 of them, > and could at least include a few of the larger UNIX utilities, > although most of them fit into reasonable-sized caches, and hence > don't stress things the way larger applications do. How about, instead, compiles? They are usually CPU intense (unless you have a REALLY terrible disk system, and reflect the general non-numeric work-load typical of most cpus. I recall reading analyses of instruction mixes from many different non-numeric applications that show they don't vary much. Since compiles of Unix source files is a "portable test" it might be suitable... > 3) Multi-user and/or systems benchmarks, using UNIX. Run shell > scripts, etc. I'dthink there should at least be a few of these. >One might want to focus on 1&2, if only to avoid the arguments on 3 >regarding different peripheral choices, operating system tuning, etc, >unless the shootout is intended as an OS shootout also. Personally, I see item 3 as being at least as important as 1 and 2, from a practical point of view. Overall system performance is ultimately the only thing that matters. Whether system A is faster than system B because the OS is a better port, or the compiler optimizes better, or the I/O subsystem is really hot, is of interest to us as system designers, only insofar as it helps us design better systems. This is the primary reason why I'm talking about the 532, to help other hardware hacks do a good job designing it in. In my experience all CPUs are the same speed when they're sitting on your desk... 8-) All for now. Regards, George Grenley usual disclaimer - either I'm lying, or not...