Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!sei.cmu.edu!fs7.ece.cmu.edu!o.gp.cs.cmu.edu!ram From: ram+@cs.cmu.edu (Rob MacLachlan) Newsgroups: comp.lang.lisp Subject: Re: isqrt Message-ID: <1991Jun13.003122.19863@cs.cmu.edu> Date: 13 Jun 91 00:31:22 GMT References: <676362388.61@egsgate.FidoNet.Org> Sender: netnews@cs.cmu.edu (USENET News Group Software) Organization: School of Computer Science, Carnegie Mellon Lines: 77 I have done an evaluation this "isqrt" benchmark on CMU CL systems here at CMU. I discovered that the results being computed were wrong due to bugs in division. After these problems had been solved, the times were significantly longer than those reported Miles, but still much faster than those of commercial implementations. (generally 2-5x) I'd like the remark about the apparent inconsistency of high CMU bignum primitive performance v.s. poor performance on the built-in ISQRT. Really there is no inconsistency. The bignum implementation has consistently *not* been tuned. When we first got it running, it was already much faster than the commercial implementations. As to why CMU is faster, it's hard to say for sure. It isn't algorithmic cleverness; we implemented Knuth fairly directly. My guess is that the biggest factor is compiler technology: the CMU bignum code is written in Lisp using a few extra primitives like 32 x 32 -> 64. Unlike the memory-to-memory operations described in the Lucid bignum paper, these are register-to-register operations. Python efficiently open-codes 32 bit integer operations (even using the standard CL arithmetic/logic functions.) In the case of division (which these benchmarks are intensive with), it is also possible that we have implemented the 64 / 32 divide more efficiently. This must be done as an assembly routine, since neither MIPS nor SPARC have an extended divide instruction. So what is the significance of this benchmark (and of bignum performance in general)? Well, that obviously depends on how much you use bignums. These sorts of large speedups for CMU are not seen for more conventional "symbol-crunching" Lisp programs (20% is more common...) But if you are working on mathematical or scientific software, you will like CMU CL's dramatic superiority in bignum and floating-point arithmetic. The fact that CMU bignums are written in Lisp and are public domain also makes it easy for people doing serious work with extended-precision arithmetic to efficiently add any new primitives they need. As you may have inferred from Miles's message, CMU CL is up on SUNOS Sparcs. However, we are not ready to release it yet. We will post to comp.lang.lisp. Raw data: The Lucid numbers were copied from the net. The SparcStation 1+ numbers were done at CMU using allegro 4.0.1. Both CMU and Allegro numbers are elapsed (real) time, not any sort of "CPU" time. 1000 digits system\test mid1 rec ds5000 CMU 0.38 0.09 " Lucid 1.34 0.32 ds3100 CMU 0.56 0.14 " Lucid 2.34 0.55 ss1+ CMU 0.77 0.27 " Allegro 1.65 0.38 3000 digits system\test mid1 rec ds5000 CMU 3.39 0.69 " Lucid 13.7 2.70 ds3100 CMU 4.88 1.02 " Lucid 23.9 4.71 ss1+ CMU 5.37 1.15 " Allegro 16.6 3.29 10000 digits system\test mid1 rec ds5000 CMU 38.7 10.5 " Lucid 161 39.6 ds3100 CMU 56.9 16.6 " Lucid 284 69.5 ss1+ CMU 64.5 17.6 " Allegro 198 48.2 Robert A. MacLachlan (ram@cs.cmu.edu)