Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!im4u!oakhill!davet
From: davet@oakhill.UUCP
Newsgroups: comp.sys.m68k,comp.sys.intel
Subject: Re: 386 vs 020 and big benchmarks (sieve)
Message-ID: <866@oakhill.UUCP>
Date: Sat, 18-Apr-87 07:28:36 EST
Article-I.D.: oakhill.866
Posted: Sat Apr 18 07:28:36 1987
Date-Received: Sun, 19-Apr-87 02:05:15 EST
References: <930@intsc.UUCP> <513@omen.UUCP> <933@intsc.UUCP>
Reply-To: davet@oakhill.UUCP (Dave Trissel)
Distribution: comp
Organization: Motorola Inc. Austin, Tx
Lines: 72
Xref: utgpu comp.sys.m68k:338 comp.sys.intel:143

[Sigh.  I guess I'll have to put my marketing hat on :-(  ]

In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

>                 ....                                     That is what I
>meant when I said that the benchmarks that show the 020 as faster fit into
>256 bytes.  

As several others have indicated the instructions executed most in program
loops usually fit into a small instruction cache.  Look at a typical huge
number crunching program.  It consists of loops within loops within loops.
However, at any one time it is usally executing in an innermost loop somewhere
and these tend not to be huge expanses of code.  This has been my experience
with large astrophysics programs.  (If anyone else finds their experience is
different let's hear from you.)

>      ...                        There is a number of application in the
>embedded control area that have inner loops that fit nicely in 256 bytes.
>Line drawing routines in graphics applications is one, thats why we build
>H/W accelerators for that.
>

Huh?  The only way I can interpret this is that since the 386 doesn't have
a small instruction cache designers have to include hardware acclerators to
do line drawing routines and graphics applications.

>If all you want to do is calculate sieves all day then use the '020.  But
>if you want to do real crunching on large problems then the 386 will run
>circles around the '020.                ...
>If performance is what you need on Megabyte
>size problems the 386 will give you 50% - 75% more speed at the same clock
>rate.

People in this newsgroup recognize marketing hype like this when they see it.
All it does in most people's minds is invalidate any data you give out in
support of your cause.  Just present your "facts" and let them speak for
themselves.

What's most surprizing, however, is that in giving us the output of your
compiler you have shown one big reason for doubting your above claims.

Notice the line in the sieve:

	register int i,p,k,c,n;

and then notice your compiler fails to assign the variable 'c' to a register
for the statement 'c++;':

>.L29:
>	incl	-4(%ebp)

This would imply that since the variable 'c' is fourth in the list that your
compiler on the 386 is limited to supporting only three register variables.
The same compiler for the M68000 assigns all five variables to registers which
is only *HALF* of the ten available for variable assignment.

If the 386 only supports three register variables and in this tiny benchmark
(which you love to deride as being insignificant) the 386 actually runs
out of registers to assign, how are we supposed to believe your claims that
on real meaty applications the 386 actually performs better than other
architectures with plenty of registers?

>>  Execute                Code
>> Real    User    System
>> 
>> .34     .3416   Definicom SYS 68020 25mHz SiVlly 11/86
>> .56     .56     CompDyn (Intel MB) + 386 Toolkit 12/86
>  .59     .59     Intel 310/386 16MHz Unix V.3 rcc  4/16
   .46     .46     Motorola VME310 16MHz Unix V.3 pcc2 4/87

 -- Dave Trissel  Motorola Semiconductor Inc., Austin, Texas
	{ihnp4,seismo}!ut-sally!im4u!oakhill!davet