Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!intelca!intsc!tomk From: tomk@intsc.UUCP (Tom Kohrs @fae) Newsgroups: comp.sys.m68k,comp.sys.intel Subject: 386 vs 020 and big benchmarks (sieve) Message-ID: <933@intsc.UUCP> Date: Thu, 16-Apr-87 20:56:01 EST Article-I.D.: intsc.933 Posted: Thu Apr 16 20:56:01 1987 Date-Received: Sun, 19-Apr-87 10:41:32 EST References: <930@intsc.UUCP> <513@omen.UUCP> Distribution: comp Organization: Intel Sales, Silicon Valley, Ca. Lines: 96 Xref: mnetor comp.sys.m68k:363 comp.sys.intel:159 In article <513@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes: > In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > > :Show me a benchmark that does not fit in 256 bytes thats even keeps up ^^^^^^^^^ (note for ref.) > :with at 16MHz 386. 386's are now shipping at 20MHz for the speed freaks. > :25MHz soon. > > Well, here's one that takes 8k, somewhat larger than 256 bytes. A 25 mHz > 68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.) > Code left in for reference. > siev.c: > #define S 8190 > char f[S+1]; > main() > { > /* register long i,p,k,c,n; For 32 bit entries for PC */ > register int i,p,k,c,n; > for (n = 1; n <= 10; n++) { > c = 0; > for (i = 0; i <= S; i++) f[i] = 1; ___ | for (i = 0; i <= S; i++) { | if (f[i]) { | p = i + i + 3; k = i + p; | while (k <= S) { f[k] = 0; k += p; } | c++; | } | } |__ > } > printf("\n%d primes.\n", c); > } The following is the as output of the rcc compiler (no opt.) under Unix for the inner loop of the sieve benchmark as included above: .L21: xorl %edi,%edi jmp .L26 .L27: cmpb $0,f(%edi) je .L28 movl %edi,%eax addl %edi,%eax leal 3(%eax),%eax movl %eax,%esi movl %edi,%eax addl %esi,%eax movl %eax,%ebx jmp .L30 .L31: movb $0,f(%ebx) movl %esi,%eax addl %eax,%ebx .L30: cmpl $8190,%ebx jle .L31 .L29: incl -4(%ebp) .L28: incl %edi .L26: cmpl $8190,%edi jle .L27 The compiler generated ~62 bytes of code (if I ever figure out sdb I will know for sure). Assuming the 020 compiler does not generate more than 4X the amount of code this will all fit into the 020 cache. That is what I meant when I said that the benchmarks that show the 020 as faster fit into 256 bytes. If all you want to do is calculate sieves all day then use the '020. But if you want to do real crunching on large problems then the 386 will run circles around the '020. That is not to say the '020 with the 256 byte cache does not have its niches. There is a number of application in the embedded control area that have inner loops that fit nicely in 256 bytes. Line drawing routines in graphics applications is one, thats why we build H/W accelerators for that. If performance is what you need on Megabyte size problems the 386 will give you 50% - 75% more speed at the same clock rate. BWT: The numbers for the 386 on the 18MHz CompDyn (.59sec) matched what I got under Unix on my MB-I box (.59sec). The biggest performance hit on this benchmark for the systems tested is due to the wait states taken for a write (3ws on the MB-I board). This could easily be fixed on a system with posted writes or a write back cache. > Compile-Link Execute Code > Real User Real User Bytes System > > 7.4 .8 .34 .3416 124 Definicom SYS 68020 25mHz SiVlly 11/86 > 11.8 2.8 .56 .56 131 CompDyn (Intel MB) + 386 Toolkit 12/86 3.0 .6 .59 .59 ? Intel 310/386 16MHz Unix V.3 rcc 4/16