Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!pioneer!lamaster From: lamaster@pioneer.UUCP Newsgroups: comp.sys.m68k Subject: Re: Recent Motorola ad seen in Byte Message-ID: <1274@ames.UUCP> Date: Wed, 15-Apr-87 11:54:41 EST Article-I.D.: ames.1274 Posted: Wed Apr 15 11:54:41 1987 Date-Received: Fri, 17-Apr-87 02:12:45 EST References: <362@sbcs.UUCP> <1466@ncr-sd.SanDiego.NCR.COM> <580@plx.UUCP> <513@omen.UUCP> <285@winchester.mips.UUCP> <518@omen.UUCP> Sender: usenet@ames.UUCP Reply-To: lamaster@pioneer.UUCP (Hugh LaMaster) Distribution: comp Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 69 Keywords: Sieve, 68020 vs. 80386, benchmarks, caches In article <518@omen.UUCP> caf@.UUCP (PUT YOUR NAME HERE) writes: >In article <285@winchester.mips.UUCP> djl@mips.UUCP (Dan Levin) writes: >:If what you care about is performance of a processor on very small >:integer compute loops, then use sieve and its ilk. If what you care >:about is performance under actual application conditions, you must >:use benchmarks that more accurately reproduce those types of environments. > >I thought many programs spend time looping in fairly localized loops, >especially on machines that lack high powered string instructions that >are useful to C. What are "for" and "while" statements for? Note that > >While it is conceivable that Motorola put the 256 byte cache in the 68020 >just to help certain benchmarks, it is more likely that the cache >actually improves performance rather inexpensively. I agree. Two points: 1) To see whether a small cache is really going to help performance, you need to look at the memory reference pattern of generated code carefully. I assume that motorola did this. A problem with small caches that are shared between code and data is that data references can "take over" the cache and force unnecessary memory references for instruction fetches, holding up instruction issue on a pipelined machine. A separate instruction cache is usually indicated in my opinion, with the additional benefit of doubling the "cache bandwidth" without complicated logic. Seymour Cray/Control Data/ETA/Neil Lincoln, etc. lineage of machines have always had small (e.g. in the range of 32-256 instructions) instruction caches, and NO data caches (but sometimes hundreds of registers). These semi-risc load/store pipelined (including memory references) machines demonstrated very good performance on a wide variety of code; they did especially well on number crunching when code segments that fit in cache (the instruction "stack" on older machines - almost a cache ...) demonstrated a scalar speed up of a factor of two or more, often, on the many loops that fit in cache. A small cache can do a great deal of good on a pipelined machine if the net effect is to speed up instruction issue. The effect of data caches is much less pronounced or predictable. I believe that a majority of engineering/scientific codes and also system code have widely scattered memory reference patterns. So, I am much more sceptical of the effect of data caches on real world problems. However, a small data cache is sometimes a useful substitute for registers (enter RISC debate). Data caches do have an exaggerated effect on many of the popular small benchmarks like Dhrystone, unfortunately. The writers of these benchmarks usually take the easy way out because it is hard to duplicate average code in a concise benchmark. And systems with data caches sometimes appear faster than they are in the real world. There are Engineering/Scientific benchamarks that do not have so much of this problem (e.g. the Dongarra Linpack benchmark, with the dimensions appropriately scaled). There does seem to be a place out there for a better scalar/system code benchmark. I think a tree sort of a large random array is probably a better overall benchmark (of the inherent CPU speed) than Dhrystone, but does not provide as good coverage of the types of code that are usually encountered. Byte addressing, extraction of fields from records, string searches, etc. are usually considered desirable in a benchmark to ferret out architectural or compiler weaknesses, but the correct way to test performance on these types of codes is a matter of debate :-) Hugh LaMaster, m/s 233-9, UUCP {seismo,topaz,lll-crg,ucbvax}! NASA Ames Research Center ames!pioneer!lamaster Moffett Field, CA 94035 ARPA lamaster@ames-pioneer.arpa Phone: (415)694-6117 ARPA lamaster@pioneer.arc.nasa.gov "In order to promise genuine progress, the acronym RISC should stand for REGULAR (not reduced) instruction set computer." - Wirth ("Any opinions expressed herein are solely the responsibility of the author and do not represent the opinions of NASA or the U.S. Government")