Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!sei.cmu.edu!fs7.ece.cmu.edu!o.gp.cs.cmu.edu!ram
From: ram+@cs.cmu.edu (Rob MacLachlan)
Newsgroups: comp.lang.lisp
Subject: Re: isqrt
Message-ID: <1991Jun13.003122.19863@cs.cmu.edu>
Date: 13 Jun 91 00:31:22 GMT
References: <676362388.61@egsgate.FidoNet.Org>
Sender: netnews@cs.cmu.edu (USENET News Group Software)
Organization: School of Computer Science, Carnegie Mellon
Lines: 77

I have done an evaluation this "isqrt" benchmark on CMU CL systems here at CMU.
I discovered that the results being computed were wrong due to bugs in
division.  After these problems had been solved, the times were significantly
longer than those reported Miles, but still much faster than those of
commercial implementations.  (generally 2-5x)

I'd like the remark about the apparent inconsistency of high CMU bignum
primitive performance v.s. poor performance on the built-in ISQRT.
Really there is no inconsistency.  The bignum implementation has consistently
*not* been tuned.  When we first got it running, it was already much faster
than the commercial implementations.

As to why CMU is faster, it's hard to say for sure.  It isn't algorithmic
cleverness; we implemented Knuth fairly directly.  My guess is that the biggest
factor is compiler technology: the CMU bignum code is written in Lisp using a
few extra primitives like 32 x 32 -> 64.  Unlike the memory-to-memory
operations described in the Lucid bignum paper, these are register-to-register
operations.  Python efficiently open-codes 32 bit integer operations (even
using the standard CL arithmetic/logic functions.)

In the case of division (which these benchmarks are intensive with), it is also
possible that we have implemented the 64 / 32 divide more efficiently.  This
must be done as an assembly routine, since neither MIPS nor SPARC have an
extended divide instruction.

So what is the significance of this benchmark (and of bignum performance in
general)?  Well, that obviously depends on how much you use bignums.  These
sorts of large speedups for CMU are not seen for more conventional
"symbol-crunching" Lisp programs (20% is more common...)  But if you are
working on mathematical or scientific software, you will like CMU CL's dramatic
superiority in bignum and floating-point arithmetic.

The fact that CMU bignums are written in Lisp and are public domain also makes
it easy for people doing serious work with extended-precision arithmetic to
efficiently add any new primitives they need.

As you may have inferred from Miles's message, CMU CL is up on SUNOS Sparcs.
However, we are not ready to release it yet.  We will post to comp.lang.lisp.


Raw data:

The Lucid numbers were copied from the net.  The SparcStation 1+ numbers were
done at CMU using allegro 4.0.1.  Both CMU and Allegro numbers are elapsed
(real) time, not any sort of "CPU" time.

1000 digits

system\test	mid1	rec
ds5000 CMU	0.38	0.09
   "   Lucid	1.34	0.32
ds3100 CMU	0.56	0.14
   "   Lucid	2.34	0.55	
ss1+  CMU	0.77	0.27
   "  Allegro	1.65	0.38

3000 digits

system\test	mid1	rec
ds5000 CMU	3.39	0.69
   "   Lucid   13.7	2.70
ds3100 CMU	4.88	1.02
   "   Lucid   23.9	4.71
ss1+ CMU	5.37	1.15
   " Allegro   16.6	3.29

10000 digits

system\test	mid1	rec
ds5000 CMU	38.7	10.5
   "   Lucid   161	39.6
ds3100 CMU	56.9	16.6
   "   Lucid   284	69.5
ss1+  CMU	64.5	17.6
   "  Allegro  198	48.2

  Robert A. MacLachlan (ram@cs.cmu.edu)