Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!snorkelwacker.mit.edu!hsdndev!spdcc!ima!dirtydog!suitti
From: suitti@ima.isc.com (Stephen Uitti)
Newsgroups: comp.benchmarks
Subject: Re: Ole Swang's benchmark: Sum of Harmonic Series
Message-ID: <1990Dec28.004003.6807@dirtydog.ima.isc.com>
Date: 28 Dec 90 00:40:03 GMT
References: <OLES.90Dec13213301@kelvin.uio.no> <44125@mips.mips.COM> <1990Dec19.004003.20667@dirtydog.ima.isc.com> <2720@sixhub.UUCP>
Sender: news@dirtydog.ima.isc.com (NEWS ADMIN)
Reply-To: suitti@ima.isc.com (Stephen Uitti)
Organization: Interactive Systems, Cambridge, MA 02138-5302
Lines: 88

In article <2720@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) writes:
>In article <1990Dec19.004003.20667@dirtydog.ima.isc.com> suitti@ima.isc.com (Stephen Uitti) writes:
>
>| The do-while is faster.  On the 386/25 with 387, the "for" loop
>| took 101.6 seconds, and the "do-while" loop took 99.8 seconds.
>
>    Nope. You have changed both the loop type and the algorithm here.
>The corresponding for loop would be
>
>	for (n=100000; --n; ) { ... }
>
>and the reason your version runs faster is that it avoids the compare,
>not because it's a do-while.
>
>  As in most benchmarks the effect of changing the program to make it
>faster also means the numbers no longer compare to the old values in a
>meaningful way.

As a benchmark of machines, it is pretty bad.  It does lots of
floating point divides, with some loop overhead.  What the benchmark
might be able to tell us is something about tuning floating point
divides or loop overhead.

Floating point divides don't happen to interest me much, but loop
overhead does, since lots of programs have it.

I'm aware that my version wasn't the same as the original - it
can even produce a different answer.  It is interesting that your
version is not the same as my do-while.  For one, it doesn't
compute 1/100000.  The condition is performed before the loop,
and your decrement is there.
	for (i = 10000000; i != 0; i--) {
is more accurate.

Oddly, this generates
	decl %eax
	jne .L5

for one compiler, and

	decl	%edi
	testl	%edi,%edi
	jne	.L70

for another on the same system.  However, the do-while generates

	decl	%edi
	jne	.L69

on the available compilers.  At on time, I thought that Dennis put
the do-while into the language just for subtract-one-and-branch
instructions.

It is more difficult for a compiler to notice that a 'for' loop
can be optimized to get rid of the compare.  In a 'for' loop, the
test really is at the top.  In a 'do-while', the test is at the
bottom, where the optimization is.

Now, getting rid of one of the loop overhead instructions on the
386/25 (with 387) speeds up the program by 1.8% - hardly
noticeable for any real program.  The problem is that the
instruction that is removed is one of 12 for the loop, and by
comparison, a very quick one.  Loop unrolling should do better.
However, unrolling it 10 times, slows down the program to 104.6
seconds.  Stuffing 79 instructions into the loop probably means
the 386's cache isn't getting hit as much - more than undoing any
benefits.  In fact, unrolling the loop 5 times is also slower
than not unrolling.  I wonder if there are compilers out there
that can do loop unrolling that also know how big caches are.
Are they smart enough to optimize this benchmark?  The loop index
is not invariant- it gets used in the loop.

I've been attempting to get this trivial benchmark to tell me
something about the tools I have.  I doesn't have anything really
exciting to say - just give or take a couple percent.  I don't
attempt to make my compilers faster - even when I have source -
it isn't my job.  I try to use the tools available.  For example,
use multiplies over divides.  My motto has been "Don't trust the
compiler to convert 'x / 5.0' into 'x * 0.2'".

However, some of the older, dumber compilers produce faster code
than newer, smarter compilers.  The new ones seem to get caught
up in attempting to figure out what my pointers are doing so much
that they forget about the simpler optimizations that were
designed into the language.  I don't know whether to laugh or cry.

Stephen.
suitti@ima.isc.com