Newsgroups: comp.lang.scheme Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!bronze!copper!huntley From: huntley@copper.ucs.indiana.edu (Haydn Huntley) Subject: Re: Benchmarking Scheme Message-ID: <1991May2.155838.20830@bronze.ucs.indiana.edu> Sender: news@bronze.ucs.indiana.edu (USENET News System) Organization: Indiana University, Bloomington References: Distribution: comp.lang.scheme Date: Thu, 2 May 91 15:58:38 GMT Lines: 105 In article jaffer@zurich.ai.mit.edu (Aubrey Jaffer) writes: | At the end of this message is benchmark code in C and in Scheme. My | original hope was that computing the ratios of execution time for the | 2 programs would allow me to roughly measure speed of an | implementation independent of the hardware. There are problems with | this approach. | | The benchmark programs take an argument and compute that many digits | of pi. The programs take time proportional to the square of the | number of digits. This allows one to compensate for widely differing | times of computation. | | When I measure the times on my 10Mhz 68000 I get: | (pi 100) takes 33.9 secs (scm2d) | time pi 200 takes 7.1 secs | 33.9/(7.1/4) ==> 19 times slower than C | | But on kleph.ai.mit.edu (which I think has a 68020) I get: | (pi 100) takes 6.6 secs (scm2d) | time pi 800 takes 7.6 secs | 6.6/(7.6/64) ==> 56 times slower that C ??! | | What is going on here? J. Finger suggests that kleph has an | instruction cache. That would mean that the C inner loop stays in the | cache while the scm interpreter doesn't. I also performed some similar benchmarks on a VAX 8650 and a Sun 3, and compared your C version compiled with GCC, and your Scheme version compiled with Chez Scheme 3.2 and found that the C versions were 20 to 30 times faster. The times I measured were: On a VAX 8650 (copper.ucs.indiana.edu) (pi 100) takes 1.84 secs (Chez Scheme 3.2) (pi 800) takes 106.93 secs (Chez Scheme 3.2) time pi 100 takes .06 secs (GCC 1.39) time pi 800 takes 3.54 secs (GCC 1.39) ratio of Scheme to C is approx. 30 On a Sun 3 with a 16Mhz 68020 (cssun2.ucs.indiana.edu) (pi 100) takes 4.84 secs (Chez Scheme 3.2) (pi 800) takes 246.46 secs (Chez Scheme 3.2) time pi 100 takes .2 secs (GCC 1.39) time pi 800 takes 10.0 secs (GCC 1.39) ratio of Scheme to C is approx. 24 So what is going on here? Well, simply put, Scheme is not nearly as efficient a language for executing this benchmark as C is. In general, Scheme can be a more convenient language to program because it does garbage collection for you, while a language like C requires more care when doing dynamic memory allocation and reclaimation. On the other hand, C is usually much more efficient, because C compilers can usually store variables in registers, and at the worst, they can always put them on the stack, while Scheme compilers sometimes must store things out in the heap, and use registers less advantagously than C compilers do. In addition, Scheme has latent typing and generic arithmetic, which require tests to be made at run-time, while C has static typing, so it doesn't have to do any tests on the types of variables. I would guess (and I'm just a lowly CS grad student) that the difference between using a cache, and missing it's use, typically would only account for a speed difference of at most a factor of 4, and more probably only a factor of 2. Did you use the same compiler and same options on both your 68000 and 68020? Do they both run at the same clock rate? Your C benchmarks show that kaleph is 8 times faster than your 68000! | So what are the prospects for machine independent performance | measures? An algorithm with a more complicated inner loop would keep | the C program out of cache. But such algorithms tend either to be | artifically lengthened or lack the nice scaling properties of the pi | programs. Part of the appeal to me of the pi program is that it does | an actual difficult computation with the best code I can devise. | Previously I used an intentionally less that optimal fibonacci. | | An argument can also be made that the cached figures are more | realistic for everyday programming problems. This has an interesting | consequence in that the penalty for using interpreters on newer | machines is more severe than on old ones. | | So if anyone can suggest solitions to these problems I would be very | interested. By the way, on kleph MITScheme takes 22 secs for (pi 100) | interpreted (185 times slower) and about 8 secs for (pi 100) compiled | (about 67 times slower) so I guess my interpreter is not doing too | badly. I'd say that your interpreter seems to be incredibly fast! Is it an really an interpreter or actually a compiler? Does it support generic arithmetic? (fixnums, bignums, and flonums) Does it do run-time checks to see if the heap is nearly full? Does it perform any type checks at run-time? Please don't laugh at these questions, but I've been working on a Scheme compiler this semester, and I learned how expensive it is to implement generic arithmetic, garbage collection, and type checking! Each one of them costs something like a factor of 2 in performance! --Haydn -- ;; ***************************************************** ;; * Haydn Huntley huntley@copper.ucs.indiana.edu * ;; *****************************************************