Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!snorkelwacker.mit.edu!ai-lab!zurich.ai.mit.edu!jinx From: jinx@zurich.ai.mit.edu (Guillermo J. Rozas) Newsgroups: comp.lang.scheme Subject: Re: Benchmarking Scheme Message-ID: Date: 2 May 91 21:46:41 GMT References: <1991May2.155838.20830@bronze.ucs.indiana.edu> Sender: news@ai.mit.edu Reply-To: jinx@zurich.ai.mit.edu Distribution: comp.lang.scheme Organization: M.I.T. Artificial Intelligence Lab. Lines: 187 In-reply-to: huntley@copper.ucs.indiana.edu's message of 2 May 91 15:58:38 GMT | So if anyone can suggest solitions to these problems I would be very | interested. By the way, on kleph MITScheme takes 22 secs for (pi 100) | interpreted (185 times slower) and about 8 secs for (pi 100) compiled | (about 67 times slower) so I guess my interpreter is not doing too | badly. I'd say that your interpreter seems to be incredibly fast! Is it an really an interpreter or actually a compiler? Does it support generic arithmetic? (fixnums, bignums, and flonums) Does it do run-time checks to see if the heap is nearly full? Does it perform any type checks at run-time? The numbers don't tell the whole story. When analyzing benchmarks you have to be very careful. The benchmark does not so much test the speed of the interpreter or compiler used for evaluation, but mostly the speed of the numeric and IO libraries in the system. I will justify my claims by showing the results of a few tests that I ran using MIT Scheme's compiler. I'm not trying to justify MIT Scheme's apparent slowness, nor make a sales pitch for it. I'm just using it as an exploratory tool because I know it well. I have three versions of the benchmark: A: tst-initial, the program sent by Aubrey Jaffer without modification. In MIT Scheme the program will do out-of-line calls for all numeric operations. B: tst-generic, the program modified (in an MIT-Scheme-specific way) to open code generic arithmetic. C: tst-fixnum, the program modified (in an MIT-Scheme-specific way) to use fixnum-only arithmetic instead of generic arithmetic. Note that in MIT Scheme, the out-of-line versions of the generic arithmetic procedures are VERY slow, thus if the code ever calls them, it will not run very fast. This is not a property of the interpreter/compiler, but a property of the way the generic arithmetic routines are built. A short amount of work would fix this, but no one has really worked on improving the speed of the standard library. Here is a transcript, with some comments. - The code was run on chamartin.ai.mit.edu, an HP9000s350 with 32 Mbytes of memory identical to Kleph, where Aubrey ran his tests. The HP9000s350 has a 25 MHz 68020. - SHOW-TIME prints times in milliseconds. 1 ]=> (load "/zu/jinx/tmp/tst-initial") Loading "/zu/jinx/tmp/tst-initial.com" -- done ;Value: pi 1 ]=> (show-time (lambda () (pi 100))) 00003 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 process time: 8120; real time: 8386 ;No value 1: Pretty much as reported by Aubrey. 1 ]=> (load "/zu/jinx/tmp/tst-generic") Loading "/zu/jinx/tmp/tst-generic.com" -- done ;Value: pi 1 ]=> (show-time (lambda () (pi 100))) 00003 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 process time: 2900; real time: 3086 ;No value 2: Not much of an improvement. 1 ]=> (fluid-let ((r 10000)) (show-time (lambda () (pi 128)))) 00003 01415 09265 03589 07932 03846 02643 03832 07950 02884 01971 06939 09375 01058 02097 04944 05923 00781 06406 02862 00899 08628 00348 02534 02117 00679 process time: 660; real time: 784 ;No value 3: Noticeable improvement even though it is run on a larger parameter. BTW, the leading 0 in all the strings should be ignored. The program is not parameterized correctly according to the value of R. 1 ]=> (load "/zu/jinx/tmp/tst-fixnum") Loading "/zu/jinx/tmp/tst-fixnum.com" -- done ;Value: pi 1 ]=> (fluid-let ((r 10000)) (show-time (lambda () (pi 128)))) 00003 01415 09265 03589 07932 03846 02643 03832 07950 02884 01971 06939 09375 01058 02097 04944 05923 00781 06406 02862 00899 08628 00348 02534 02117 00679 process time: 520; real time: 591 ;No value 4: Not much of an improvement. See below. Explanation: 1: Shows the results as reported by Aubrey. Even when compiled, the code runs slowly in MIT Scheme. 2: The results using open-coded generic arithmetic. Not much faster. 3: Same program as above but with a smaller value of R and a larger value of N to generate the same digits. It should be slower (the program is roughly quadratic on N) but runs much faster. What's the deal? The answer is that the unmodified program occasionally uses bignums in MIT Scheme because the range of fixnums in MIT Scheme is not large enough to accomodate the range of values of Q. By reducing R, we avoid bignums, and the program runs much faster. Note that in MIT Scheme fixnums only cover the range -2^25 <= x <= (2^25)-1, and Q in the loop in Aubrey's program exceeds this range every 8-10 iterations. I believe that Aubrey's scm uses fixnums in the range -2^29 <= x (2^29)-1, but I'm not certain. 4: Not much of a difference between generic arithmetic and fixnum arithmetic. Most of the time at this point is spent in NUMBER->STRING, DISPLAY, etc., and only speeding those up will make the code run faster. Clearly (because of 2 and 3) the time taken by the program is not so much a consequence of the speed of the interpreter/compiler as a consequence of the particular procedures being called and the range of values in use. Thus the benchmark tests the speed of the support library, not the speed of the underlying interpreter or compiler. It is a valid benchmark (most programs are), but we need to draw the correct conclusions from the results. To test the inherent speed of an interpreter or compiler, I would write a program that does not use any of the constants (ie. primitives) in the language. For example, it could use Church pairs and Church integers. Thus most (or all) differences in the runtime libraries would be removed. Of course, we could not then compare it to C. PS: For those who care, the code in B was the benchmark as sent by Aubrey, preceded by (declare (usual-integrations) (integrate-external "/scheme/runtime/fixart")) (define-integrable (quo x y) (if (and (fix:fixnum? x) (fix:fixnum? y)) (fix:quotient x y) (quotient x y))) (define-integrable (rem x y) (if (and (fix:fixnum? x) (fix:fixnum? y)) (fix:remainder x y) (remainder x y))) (define (mod x y) (let ((r (rem x y))) (if (or (zero? r) (< x 0) (< y 0) (not (< y 0))) r (+ r y)))) And with the renames quotient->quo, remainder->rem, modulo->mod. The reason is that (due to oversight) none of the procedures above are open-coded by the compiler.