Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!amdahl!chuck
From: chuck@amdahl.uts.amdahl.com (Charles Simmons)
Newsgroups: comp.arch
Subject: Re: missing Dhrystone 2.1 (1 of 3 &
Message-ID: <9a0K/cbluk1010IHSPc@amdahl.uts.amdahl.com>
Date: 20 Jul 88 06:00:31 GMT
References: <4232@cbmvax.UUCP> <76700035@p.cs.uiuc.edu> <1988Jul18.231331.19575@utzoo.uucp> <22406@amdcad.AMD.COM> <9amsvb52K11010cyawo@amdahl.uts.amdahl.com>
Reply-To: chuck@amdahl.uts.amdahl.com (Charles Simmons)
Organization: Amdahl Corporation, Sunnyvale  CA
Lines: 94

In article <9amsvb52K11010cyawo@amdahl.uts.amdahl.com> littauer@amdahl.uts.amdahl.com (Tom Littauer) writes:
>In article <22406@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes:
>>In article <1988Jul18.231331.19575@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>| In article <76700035@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes:
>>| >I certainly find it hard to believe that the top of the line Amdahl
>>| >machine achieves 90,000+ Dhrystones...
>>| > It nearly doubles the performance of the best Cray compiler
>>| >reported (admittedly, compiling C for a Cray is probably hard, but
>>| >Crays are very decent scaler machines!  sheesh)...
>>| 
>>| Ah, but Crays are *not* good character machines, and Dhrystone is known
>>| to be excessively string-intensive.
>>
>>And the Amdahl machine referenced is a dual-processor model -- I assume
>>that this was 45K Dhrystones per processor...
>
>No, 91K per each of 2 in a 5990-700 and 4 in a -1400... To be fair, that's
>not the released compiler (it was GNU cc). Does GNU do Dhrystone
>fakery? Anyway, the released compiler was 74K per each and it isn't
>optimized for Dhrystone. Our processor guys bitch that Dhrystone doesn't
>show our processors to be as fast as they really are, but that's an
>entirely different discussion.

Since there has been some discussion recently on how an Amdahl
machine with the GNU cc compiler can achieve 91K dhrystones per
processor, I thought that I'd go into a little bit of a discussion
of just what the GNU cc compiler does that achieves this performance.
I'll then let you all decide if GNU does "Dhrystone fakery".

First, I like Henry Spencer's comment very much about Crays not
being good character processing machines.  Both the Cray and the
Amdahl machines use very up-to-date technology, and there is no
good reason for believing that a multi-million dollar Amdahl
machine can't execute the Dhrystone benchmark faster than a
multi-million dollar Cray.

Cray machines are optimized for programs that require lots of
memory (say 1 Gbyte or so), floating point computations, and vector
computations.  Amdahl machines are optimized for smaller amounts
of memory (say 256 Mbytes), and scalar processing that is
non-floating-point intensive.  (These are my personal beliefs until
I'm corrected by someone who knows more.)

The GNU compiler achieves extremely good Dhrystone results as
compared to the current pcc based compiler primarily through
two mechanisms.  First, the GNU compiler performs reasonably good
register allocation.  Since the Dhrystones spend much of their
time in relatively short routines that use relatively few registers,
the GNU compiler can frequently keep all of the values that are
needed for a routine in registers that do not need to be saved on
the stack.  In addition, I have a special case optimization that I
perform so that when a subroutine does not call any other subroutine,
much of the procedure set up code is optimized away (e.g. I don't
allocate a stack frame if I don't need one).

Thus, for Dhrystones, GCC performs good register allocation, and it
generates code that keeps the overhead of calling a subroutine to
a minimum.

We do not use in-line subroutines (I tried, but GCC generated incorrect
code), nor do we do anything remotely resembling link-time register
allocation.  (Rest assured that if I do figure out ways to do things
like this, I will report this type of optimization with any results
that I publish.)

I would only make the following conclusions from these GCC results:

1)  An Amdahl mainframe is lots faster than a Vax 750.

2)  For some applications, an Amdahl mainframe may outperform a Cray.

3)  The GNU C compiler does a fairly good job of register allocation
(especially in small routines that use single word registers).

4)  The GNU C compiler is easily modified to make special case
optimizations (such as allocating stack frames only when they
are needed).

In summary, I would not be offended if anyone decided that the 91K
figure that I've published were considered a research result, but
not something that could be realistically attained using the production
compiler that we supply.  The 91K figure should be viewed as a
theoretical upper bound (a goal to shoot for), and an indication of the
types of performance levels that can be achieved in hand-coded assembler.

Still, 74K Dhrystones per processor head isn't too shabby.

(To follow up on the last comment of Tom's...  To really benchmark
an Amdahl mainframe against other types of machines, we would prefer
a benchmark that used over 100Mbytes of memory and which did lots
of I/O.  It would be fun to publish numbers that show a 68020 based
machine thrashing on the benchmark for many hours (days?) while the
mainframe completes the job in a couple minutes.)

-- Chuck