Newsgroups: comp.lang.scheme
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!bronze!copper!huntley
From: huntley@copper.ucs.indiana.edu (Haydn Huntley)
Subject: Re: Benchmarking Scheme
Message-ID: <1991May2.155838.20830@bronze.ucs.indiana.edu>
Sender: news@bronze.ucs.indiana.edu (USENET News System)
Organization: Indiana University, Bloomington
References: <JAFFER.91May1163805@kleph.ai.mit.edu>
Distribution: comp.lang.scheme
Date: Thu, 2 May 91 15:58:38 GMT
Lines: 105

In article <JAFFER.91May1163805@kleph.ai.mit.edu> jaffer@zurich.ai.mit.edu (Aubrey Jaffer) writes:
| At the end of this message is benchmark code in C and in Scheme.  My
| original hope was that computing the ratios of execution time for the
| 2 programs would allow me to roughly measure speed of an
| implementation independent of the hardware.  There are problems with
| this approach.
| 
| The benchmark programs take an argument and compute that many digits
| of pi.  The programs take time proportional to the square of the
| number of digits.  This allows one to compensate for widely differing
| times of computation.
| 
| When I measure the times on my 10Mhz 68000 I get:
| (pi 100)       takes 33.9 secs (scm2d)
| time pi 200    takes 7.1 secs
| 33.9/(7.1/4) ==> 19 times slower than C
| 
| But on kleph.ai.mit.edu (which I think has a 68020) I get:
| (pi 100)       takes 6.6 secs (scm2d)
| time pi 800    takes 7.6 secs
| 6.6/(7.6/64) ==> 56 times slower that C ??!
| 
| What is going on here?  J. Finger suggests that kleph has an
| instruction cache.  That would mean that the C inner loop stays in the
| cache while the scm interpreter doesn't.

I also performed some similar benchmarks on a VAX 8650 and a Sun 3,
and compared your C version compiled with GCC, and your Scheme version
compiled with Chez Scheme 3.2 and found that the C versions were 20 to
30 times faster.  The times I measured were:

On a VAX 8650 (copper.ucs.indiana.edu)
(pi 100)       takes   1.84 secs (Chez Scheme 3.2)
(pi 800)       takes 106.93 secs (Chez Scheme 3.2)
time pi 100    takes    .06 secs (GCC 1.39)
time pi 800    takes   3.54 secs (GCC 1.39)
ratio of Scheme to C is approx. 30

On a Sun 3 with a 16Mhz 68020 (cssun2.ucs.indiana.edu)
(pi 100)       takes   4.84 secs (Chez Scheme 3.2)
(pi 800)       takes 246.46 secs (Chez Scheme 3.2)
time pi 100    takes    .2  secs (GCC 1.39)
time pi 800    takes  10.0  secs (GCC 1.39)
ratio of Scheme to C is approx. 24

So what is going on here?  Well, simply put, Scheme is not nearly as
efficient a language for executing this benchmark as C is.  In
general, Scheme can be a more convenient language to program because
it does garbage collection for you, while a language like C requires
more care when doing dynamic memory allocation and reclaimation.  On
the other hand, C is usually much more efficient, because C compilers
can usually store variables in registers, and at the worst, they can
always put them on the stack, while Scheme compilers sometimes must
store things out in the heap, and use registers less advantagously
than C compilers do.

In addition, Scheme has latent typing and generic arithmetic, which
require tests to be made at run-time, while C has static typing, so it
doesn't have to do any tests on the types of variables.

I would guess (and I'm just a lowly CS grad student) that the
difference between using a cache, and missing it's use, typically
would only account for a speed difference of at most a factor of 4,
and more probably only a factor of 2.

Did you use the same compiler and same options on both your 68000 and
68020?  Do they both run at the same clock rate?  Your C benchmarks
show that kaleph is 8 times faster than your 68000!

| So what are the prospects for machine independent performance
| measures?  An algorithm with a more complicated inner loop would keep
| the C program out of cache.  But such algorithms tend either to be
| artifically lengthened or lack the nice scaling properties of the pi
| programs.  Part of the appeal to me of the pi program is that it does
| an actual difficult computation with the best code I can devise.
| Previously I used an intentionally less that optimal fibonacci.
| 
| An argument can also be made that the cached figures are more
| realistic for everyday programming problems.  This has an interesting
| consequence in that the penalty for using interpreters on newer
| machines is more severe than on old ones.
| 
| So if anyone can suggest solitions to these problems I would be very
| interested.  By the way, on kleph MITScheme takes 22 secs for (pi 100)
| interpreted (185 times slower) and about 8 secs for (pi 100) compiled
| (about 67 times slower) so I guess my interpreter is not doing too
| badly.

I'd say that your interpreter seems to be incredibly fast!  Is it an
really an interpreter or actually a compiler?  Does it support generic
arithmetic?  (fixnums, bignums, and flonums)  Does it do run-time
checks to see if the heap is nearly full?  Does it perform any type
checks at run-time?  

Please don't laugh at these questions, but I've been working on a
Scheme compiler this semester, and I learned how expensive it is to
implement generic arithmetic, garbage collection, and type checking!
Each one of them costs something like a factor of 2 in performance!

--Haydn

--
;;  *****************************************************
;;  *  Haydn Huntley    huntley@copper.ucs.indiana.edu  *
;;  *****************************************************