Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!snorkelwacker.mit.edu!ai-lab!zurich.ai.mit.edu!jinx
From: jinx@zurich.ai.mit.edu (Guillermo J. Rozas)
Newsgroups: comp.lang.scheme
Subject: Re: Benchmarking Scheme
Message-ID: <JINX.91May2174641@chamarti.ai.mit.edu>
Date: 2 May 91 21:46:41 GMT
References: <JAFFER.91May1163805@kleph.ai.mit.edu>
	<1991May2.155838.20830@bronze.ucs.indiana.edu>
Sender: news@ai.mit.edu
Reply-To: jinx@zurich.ai.mit.edu
Distribution: comp.lang.scheme
Organization: M.I.T. Artificial Intelligence Lab.
Lines: 187
In-reply-to: huntley@copper.ucs.indiana.edu's message of 2 May 91 15:58:38 GMT


    | So if anyone can suggest solitions to these problems I would be very
    | interested.  By the way, on kleph MITScheme takes 22 secs for (pi 100)
    | interpreted (185 times slower) and about 8 secs for (pi 100) compiled
    | (about 67 times slower) so I guess my interpreter is not doing too
    | badly.

    I'd say that your interpreter seems to be incredibly fast!  Is it an
    really an interpreter or actually a compiler?  Does it support generic
    arithmetic?  (fixnums, bignums, and flonums)  Does it do run-time
    checks to see if the heap is nearly full?  Does it perform any type
    checks at run-time?  

The numbers don't tell the whole story.  When analyzing benchmarks you
have to be very careful.  The benchmark does not so much test the
speed of the interpreter or compiler used for evaluation, but mostly
the speed of the numeric and IO libraries in the system.  I will
justify my claims by showing the results of a few tests that I ran
using MIT Scheme's compiler.

I'm not trying to justify MIT Scheme's apparent slowness, nor make a
sales pitch for it.  I'm just using it as an exploratory tool because
I know it well.

I have three versions of the benchmark:

    A: tst-initial, the program sent by Aubrey Jaffer without modification.
    In MIT Scheme the program will do out-of-line calls for all numeric
    operations.

    B: tst-generic, the program modified (in an MIT-Scheme-specific way) to
    open code generic arithmetic.

    C: tst-fixnum, the program modified (in an MIT-Scheme-specific way) to
    use fixnum-only arithmetic instead of generic arithmetic.

Note that in MIT Scheme, the out-of-line versions of the generic
arithmetic procedures are VERY slow, thus if the code ever calls them,
it will not run very fast.  This is not a property of the
interpreter/compiler, but a property of the way the generic arithmetic
routines are built.  A short amount of work would fix this, but no one
has really worked on improving the speed of the standard library.

Here is a transcript, with some comments.

- The code was run on chamartin.ai.mit.edu, an HP9000s350 with 32
Mbytes of memory identical to Kleph, where Aubrey ran his tests.  The
HP9000s350 has a 25 MHz 68020.

- SHOW-TIME prints times in milliseconds.  

    1 ]=> (load "/zu/jinx/tmp/tst-initial")


    Loading "/zu/jinx/tmp/tst-initial.com" -- done
    ;Value: pi

    1 ]=> (show-time (lambda () (pi 100)))

    00003 14159 26535 89793 23846 26433 83279 50288 41971 69399
    37510 58209 74944 59230 78164 06286 20899 86280 34825 34211
    70679 

    process time: 8120; real time: 8386
    ;No value

1: Pretty much as reported by Aubrey.

    1 ]=> (load "/zu/jinx/tmp/tst-generic")


    Loading "/zu/jinx/tmp/tst-generic.com" -- done
    ;Value: pi

    1 ]=> (show-time (lambda () (pi 100)))

    00003 14159 26535 89793 23846 26433 83279 50288 41971 69399
    37510 58209 74944 59230 78164 06286 20899 86280 34825 34211
    70679 

    process time: 2900; real time: 3086
    ;No value

2: Not much of an improvement.

    1 ]=> (fluid-let ((r 10000))
      (show-time (lambda () (pi 128))))

    00003 01415 09265 03589 07932 03846 02643 03832 07950 02884
    01971 06939 09375 01058 02097 04944 05923 00781 06406 02862
    00899 08628 00348 02534 02117 00679 

    process time: 660; real time: 784
    ;No value

3: Noticeable improvement even though it is run on a larger parameter.
BTW, the leading 0 in all the strings should be ignored.  The program
is not parameterized correctly according to the value of R.

    1 ]=> (load "/zu/jinx/tmp/tst-fixnum")


    Loading "/zu/jinx/tmp/tst-fixnum.com" -- done
    ;Value: pi

    1 ]=> (fluid-let ((r 10000))
      (show-time (lambda () (pi 128))))

    00003 01415 09265 03589 07932 03846 02643 03832 07950 02884
    01971 06939 09375 01058 02097 04944 05923 00781 06406 02862
    00899 08628 00348 02534 02117 00679 

    process time: 520; real time: 591
    ;No value

4: Not much of an improvement.  See below.

Explanation:

1: Shows the results as reported by Aubrey.  Even when compiled, the
code runs slowly in MIT Scheme.

2: The results using open-coded generic arithmetic.  Not much faster.

3: Same program as above but with a smaller value of R and a larger
value of N to generate the same digits.  It should be slower (the
program is roughly quadratic on N) but runs much faster.  What's the
deal?  The answer is that the unmodified program occasionally uses
bignums in MIT Scheme because the range of fixnums in MIT Scheme is
not large enough to accomodate the range of values of Q.  By reducing
R, we avoid bignums, and the program runs much faster.  Note that in
MIT Scheme fixnums only cover the range -2^25 <= x <= (2^25)-1, and Q
in the loop in Aubrey's program exceeds this range every 8-10
iterations.  I believe that Aubrey's scm uses fixnums in the range 
-2^29 <= x (2^29)-1, but I'm not certain.

4: Not much of a difference between generic arithmetic and fixnum arithmetic.
Most of the time at this point is spent in NUMBER->STRING, DISPLAY,
etc., and only speeding those up will make the code run faster.

Clearly (because of 2 and 3) the time taken by the program is not so
much a consequence of the speed of the interpreter/compiler as a
consequence of the particular procedures being called and the
range of values in use.

Thus the benchmark tests the speed of the support library, not the
speed of the underlying interpreter or compiler.  

It is a valid benchmark (most programs are), but we need to draw the
correct conclusions from the results.

To test the inherent speed of an interpreter or compiler, I would
write a program that does not use any of the constants (ie.
primitives) in the language.  For example, it could use Church pairs
and Church integers.  Thus most (or all) differences in the runtime
libraries would be removed.  Of course, we could not then compare it
to C.


PS: For those who care, the code in B was the benchmark as sent by
Aubrey, preceded by

(declare (usual-integrations)
	 (integrate-external "/scheme/runtime/fixart"))

(define-integrable (quo x y)
  (if (and (fix:fixnum? x) (fix:fixnum? y))
      (fix:quotient x y)
      (quotient x y)))

(define-integrable (rem x y)
  (if (and (fix:fixnum? x) (fix:fixnum? y))
      (fix:remainder x y)
      (remainder x y)))

(define (mod x y)
  (let ((r (rem x y)))
    (if (or (zero? r)
	    (< x 0)
	    (< y 0)
	    (not (< y 0)))
	r
	(+ r y))))

And with the renames quotient->quo, remainder->rem, modulo->mod.
The reason is that (due to oversight) none of the procedures above are
open-coded by the compiler.