Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!im4u!oakhill!davet From: davet@oakhill.UUCP Newsgroups: comp.sys.m68k,comp.sys.intel Subject: Re: 386 vs 020 and big benchmarks (sieve) Message-ID: <866@oakhill.UUCP> Date: Sat, 18-Apr-87 07:28:36 EST Article-I.D.: oakhill.866 Posted: Sat Apr 18 07:28:36 1987 Date-Received: Sun, 19-Apr-87 02:05:15 EST References: <930@intsc.UUCP> <513@omen.UUCP> <933@intsc.UUCP> Reply-To: davet@oakhill.UUCP (Dave Trissel) Distribution: comp Organization: Motorola Inc. Austin, Tx Lines: 72 Xref: utgpu comp.sys.m68k:338 comp.sys.intel:143 [Sigh. I guess I'll have to put my marketing hat on :-( ] In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > .... That is what I >meant when I said that the benchmarks that show the 020 as faster fit into >256 bytes. As several others have indicated the instructions executed most in program loops usually fit into a small instruction cache. Look at a typical huge number crunching program. It consists of loops within loops within loops. However, at any one time it is usally executing in an innermost loop somewhere and these tend not to be huge expanses of code. This has been my experience with large astrophysics programs. (If anyone else finds their experience is different let's hear from you.) > ... There is a number of application in the >embedded control area that have inner loops that fit nicely in 256 bytes. >Line drawing routines in graphics applications is one, thats why we build >H/W accelerators for that. > Huh? The only way I can interpret this is that since the 386 doesn't have a small instruction cache designers have to include hardware acclerators to do line drawing routines and graphics applications. >If all you want to do is calculate sieves all day then use the '020. But >if you want to do real crunching on large problems then the 386 will run >circles around the '020. ... >If performance is what you need on Megabyte >size problems the 386 will give you 50% - 75% more speed at the same clock >rate. People in this newsgroup recognize marketing hype like this when they see it. All it does in most people's minds is invalidate any data you give out in support of your cause. Just present your "facts" and let them speak for themselves. What's most surprizing, however, is that in giving us the output of your compiler you have shown one big reason for doubting your above claims. Notice the line in the sieve: register int i,p,k,c,n; and then notice your compiler fails to assign the variable 'c' to a register for the statement 'c++;': >.L29: > incl -4(%ebp) This would imply that since the variable 'c' is fourth in the list that your compiler on the 386 is limited to supporting only three register variables. The same compiler for the M68000 assigns all five variables to registers which is only *HALF* of the ten available for variable assignment. If the 386 only supports three register variables and in this tiny benchmark (which you love to deride as being insignificant) the 386 actually runs out of registers to assign, how are we supposed to believe your claims that on real meaty applications the 386 actually performs better than other architectures with plenty of registers? >> Execute Code >> Real User System >> >> .34 .3416 Definicom SYS 68020 25mHz SiVlly 11/86 >> .56 .56 CompDyn (Intel MB) + 386 Toolkit 12/86 > .59 .59 Intel 310/386 16MHz Unix V.3 rcc 4/16 .46 .46 Motorola VME310 16MHz Unix V.3 pcc2 4/87 -- Dave Trissel Motorola Semiconductor Inc., Austin, Texas {ihnp4,seismo}!ut-sally!im4u!oakhill!davet