Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!udel!rochester!pt.cs.cmu.edu!o.gp.cs.cmu.edu!ram
From: ram+@cs.cmu.edu (Rob MacLachlan)
Newsgroups: comp.lang.scheme
Subject: Re: Benchmarking Scheme
Message-ID: <1991May3.172024.21889@cs.cmu.edu>
Date: 3 May 91 17:20:24 GMT
References: <JAFFER.91May1163805@kleph.ai.mit.edu> <1991May2.155838.20830@bronze.ucs.indiana.edu>
Sender: netnews@cs.cmu.edu (USENET News Group Software)
Distribution: comp.lang.scheme
Organization: School of Computer Science, Carnegie Mellon
Lines: 112

In article <1991May2.155838.20830@bronze.ucs.indiana.edu> huntley@copper.ucs.indiana.edu (Haydn Huntley) writes:

>I also performed some similar benchmarks on a VAX 8650 and a Sun 3,
>and compared your C version compiled with GCC, and your Scheme version
>compiled with Chez Scheme 3.2 and found that the C versions were 20 to
>30 times faster.
>
>On a Sun 3 with a 16Mhz 68020 (cssun2.ucs.indiana.edu)
>(pi 100)       takes   4.84 secs (Chez Scheme 3.2)
>(pi 800)       takes 246.46 secs (Chez Scheme 3.2)
>time pi 100    takes    .2  secs (GCC 1.39)
>time pi 800    takes  10.0  secs (GCC 1.39)
>ratio of Scheme to C is approx. 24

On a DECStation 3100 running Mach:
(pi 800)       takes  2.7 secs (CMU Common Lisp Apr-28-1991, speed 3, safety 0)
time pi 800    takes  2.1 secs (MIPS cc -O4)
ratio of Lisp to C is 1.28

>So what is going on here?  Well, simply put, Scheme is not nearly as efficient
>a language for executing this benchmark as C is.  [...]  Scheme can be a more
>convenient language to program [...]  while a language like C requires more
>care. [...]  On the other hand, C is usually much more efficient, because C
>compilers can usually store variables in registers, and at the worst, they can
>always put them on the stack, while Scheme compilers sometimes must store
>things out in the heap, and use registers less advantagously than C compilers
>do.

Yes, register allocation and avoidance of number consing are crucial to good
numeric performance.  But Scheme could do it too.

>In addition, Scheme has latent typing and generic arithmetic, which
>require tests to be made at run-time, while C has static typing, so it
>doesn't have to do any tests on the types of variables.

Supporting dynamic typing and generic arithmetic doesn't mean you have to use
it all the time.  That's what type inference and declarations are for.

>| So what are the prospects for machine independent performance
>| measures? Part of the appeal to me of the pi program is that it does
>| an actual difficult computation with the best code I can devise.
>| Previously I used an intentionally less that optimal fibonacci.

Both of these benchmarks can be meaningful, but they measure *totally*
different things.  The recursive fibonacci is basically a function call
benchmark (like the traditional TAK) which means that it can probably predict
the performance of most lisp programs within a factor of 3.  Given constant
compiler technology, integer Specmarks will be a much better predictor.

This PI program is doing extended precision arithmetic.  Unless you are
implementing extended precision arithmetic in Lisp (unusual but possible), then
you are better off with another benchmark.  And if you do find a compiler that
generates reasonable code, you are probably mainly testing the speed of integer
divide.

I suggest reading comp.benchmarks for a while if you are interested in general
benchmarking concepts.

>[...] I've been working on a Scheme compiler this semester, and I learned how
>expensive it is to implement generic arithmetic, garbage collection, and type
>checking!  Each one of them costs something like a factor of 2 in performance!

Well, they may, but they don't have to, as the CMU numbers demonstrate.  Common
Lisp is pretty much a superset of scheme, modulo call/cc.  And I don't see why
supporting call/cc should make this program run any slower, since it doesn't
use call/cc, and doesn't call any functions that might.

I believe that the slowdown in non-CMU Lisps is entirely due to generic and
bignum arithmetic.  Note that the original C program was written for 32 bit
integers, although it presumably could have been written otherwise.  Because of
this, any Lisp without efficient support for full-word integers will lose
horribly, not only calling generic arithmetic routines, but also using bignum
arithmetic.

You should be able to get reasonable performance in pretty much any Lisp by
changing the program to use FIXNUM arithmetic and adding any necessary
declarations.  CMU CL has an edge here because:
 -- You don't have to change the program.
 -- You don't have to add declarations out the wazoo.
 -- Code runs reasonably fast, even with full type checking.

Here is the CL version:
________________________________________________________________
(in-package "USER")

(defconstant r 100000)
(defun pi (n)
  (declare (type (integer 0 1000) n) (inline ceiling))
  (let* ((n (ceiling n 5))
	 (m (truncate (* n 5 3322) 1000))
	 (a (make-array (+ 1 m) :initial-element 2
			:element-type '(unsigned-byte 12))))
    (setf (aref a m) 4)
    (do ((j 1 (+ 1 j))
	 (q 0 0)
	 (b 2 (rem q r)))
	((> j n))
      (declare (type (unsigned-byte 31) j b q))
      (do ((k m (- k 1)))
	  ((zerop k))
	(declare (type (unsigned-byte 12) k))
	(setq q (+ q (* (aref a k) r)))
	(multiple-value-bind (quo rem)
			     (truncate q (+ 1 (* 2 k)))
	  (declare (type (unsigned-byte 19) quo))
	  (setf (aref a k) rem)
	  (setq q (* k quo))))
      (format t "~5,'0D" (+ b (truncate q r)))
      (if (zerop (mod j 10))
	  (terpri)
	  (write-char #\space)))
    (terpri)))