Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!uw-beaver!tektronix!cae780!amdcad!amd!intelca!intsc!tomk From: tomk@intsc.UUCP Newsgroups: comp.arch Subject: Re: Re: recent 386 timings from Intel Message-ID: <929@intsc.UUCP> Date: Thu, 9-Apr-87 19:54:46 EST Article-I.D.: intsc.929 Posted: Thu Apr 9 19:54:46 1987 Date-Received: Sat, 11-Apr-87 15:07:23 EST References: <221@winchester.mips.UUCP> <2130@intelca.UUCP> <1946@hoptoad.uucp> Organization: Intel Sales, Silicon Valley, Ca. Lines: 66 > In article <2130@intelca.UUCP>, clif@intelca.UUCP (Clif Purkiser) writes: > > The system we used to run was an Intel Multibus I system > > running Unix System V Release 3.0. The CPU board was a 386/24 > > MultiBus I which has a 64 Kbyte direct-mapped write-through > > cache and 2-3 wait states for cache misses. And John Gilmore replies: > > Hmm, let's make sure: cache hits run with 0 wait states, cache misses > run with 2-3 wait states? I'm curious about the construction of such a > cache. What is the basic cycle time of the machine, and how many > cycles does a cache hit take? Is it accessing main memory over the > Multibus, or on a local bus? Is main memory static ram, or dynamic? The Multibus I board that was used for the measurements is a standard production board. It has 64K bytes of direct mapped cache based on 45ns data rams and 35ns tag rams. The DRAMs are 120ns access time variety. They could have been 150's but 120's are what we buy a lot of. The DRAM is local on the CPU board and is dual ported to the multibus. The 386/20 board (that's what we are talking about) will support up to 16MB of DRAM (Multibus I limit). The DRAM cycles are not started until after the a cache miss is detected. The first access on a cache miss will cause 3 wait states. When a cache miss occurs the CPU is switched into pipelined address mode and any subsequent misses will be 2 wait states. When a cache hit occurs again then the CPU resumes operating in non-pipelined address mode. With this setup we have measured an average of 0.7 wait states running UNIX os code. The basic bus cycle time of the machine is 2 CPU clocks. At 16MHz that is 125ns, 100ns @ 20MHz. Each wait state adds 61.25ns @ 16MHz and 50ns @ 20MHz. The basic instruction execution time is 4.5 clocks on the average with some magical instruction mix (details available on request). Adding a wait state slows down execution approx. 20%. For those curious about the compiler. The benchmarks were run with the greenhills C compiler with the opitmization switch OFF. The greenhills technology does a lot of optimization even without the -O switch so it is hard to tell how badly it destroys the inner loops of the dhrystone benchmark. The other side of the coin though is they do the same type of optimizations on the other machines. Again compare systems not CPU's. This is also why I always tell anyone interested in the 386 to come in with their favorite benchmark and run it on the box I have. So far the only place the 25MHz 68020's have beaten the 16MHz 386 is when the main loop of the code fits in 256 bytes. ------ "Ever notice how your mental image of someone you've known only by phone turns out to be wrong? And on a computer net you don't even have a voice..." tomk@intsc.UUCP Tom Kohrs Regional Architecture Specialist Intel - Santa Clara PS: John there will be a 386/20 manual in the mail to you as soon as I can find one.