Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!snorkelwacker.mit.edu!hsdndev!spdcc!ima!dirtydog!suitti From: suitti@ima.isc.com (Stephen Uitti) Newsgroups: comp.benchmarks Subject: Re: Ole Swang's benchmark: Sum of Harmonic Series Message-ID: <1990Dec28.004003.6807@dirtydog.ima.isc.com> Date: 28 Dec 90 00:40:03 GMT References: <44125@mips.mips.COM> <1990Dec19.004003.20667@dirtydog.ima.isc.com> <2720@sixhub.UUCP> Sender: news@dirtydog.ima.isc.com (NEWS ADMIN) Reply-To: suitti@ima.isc.com (Stephen Uitti) Organization: Interactive Systems, Cambridge, MA 02138-5302 Lines: 88 In article <2720@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) writes: >In article <1990Dec19.004003.20667@dirtydog.ima.isc.com> suitti@ima.isc.com (Stephen Uitti) writes: > >| The do-while is faster. On the 386/25 with 387, the "for" loop >| took 101.6 seconds, and the "do-while" loop took 99.8 seconds. > > Nope. You have changed both the loop type and the algorithm here. >The corresponding for loop would be > > for (n=100000; --n; ) { ... } > >and the reason your version runs faster is that it avoids the compare, >not because it's a do-while. > > As in most benchmarks the effect of changing the program to make it >faster also means the numbers no longer compare to the old values in a >meaningful way. As a benchmark of machines, it is pretty bad. It does lots of floating point divides, with some loop overhead. What the benchmark might be able to tell us is something about tuning floating point divides or loop overhead. Floating point divides don't happen to interest me much, but loop overhead does, since lots of programs have it. I'm aware that my version wasn't the same as the original - it can even produce a different answer. It is interesting that your version is not the same as my do-while. For one, it doesn't compute 1/100000. The condition is performed before the loop, and your decrement is there. for (i = 10000000; i != 0; i--) { is more accurate. Oddly, this generates decl %eax jne .L5 for one compiler, and decl %edi testl %edi,%edi jne .L70 for another on the same system. However, the do-while generates decl %edi jne .L69 on the available compilers. At on time, I thought that Dennis put the do-while into the language just for subtract-one-and-branch instructions. It is more difficult for a compiler to notice that a 'for' loop can be optimized to get rid of the compare. In a 'for' loop, the test really is at the top. In a 'do-while', the test is at the bottom, where the optimization is. Now, getting rid of one of the loop overhead instructions on the 386/25 (with 387) speeds up the program by 1.8% - hardly noticeable for any real program. The problem is that the instruction that is removed is one of 12 for the loop, and by comparison, a very quick one. Loop unrolling should do better. However, unrolling it 10 times, slows down the program to 104.6 seconds. Stuffing 79 instructions into the loop probably means the 386's cache isn't getting hit as much - more than undoing any benefits. In fact, unrolling the loop 5 times is also slower than not unrolling. I wonder if there are compilers out there that can do loop unrolling that also know how big caches are. Are they smart enough to optimize this benchmark? The loop index is not invariant- it gets used in the loop. I've been attempting to get this trivial benchmark to tell me something about the tools I have. I doesn't have anything really exciting to say - just give or take a couple percent. I don't attempt to make my compilers faster - even when I have source - it isn't my job. I try to use the tools available. For example, use multiplies over divides. My motto has been "Don't trust the compiler to convert 'x / 5.0' into 'x * 0.2'". However, some of the older, dumber compilers produce faster code than newer, smarter compilers. The new ones seem to get caught up in attempting to figure out what my pointers are doing so much that they forget about the simpler optimizations that were designed into the language. I don't know whether to laugh or cry. Stephen. suitti@ima.isc.com