Path: utzoo!mnetor!uunet!husc6!bbn!uwmcsd1!ig!agate!ucbvax!hplabs!sdcrdcf!sdcsmb!sea!eggert
From: eggert@sea.sm.unisys.com (Paul Eggert)
Newsgroups: comp.lang.lisp
Subject: Re: Lisp vs. C Floating Point (Suns)
Message-ID: <29@sea.sm.unisys.com>
Date: 7 Feb 88 05:20:38 GMT
References: <557@spar.SPAR.SLB.COM>
Reply-To: eggert@sea.sm.unisys.com (Paul Eggert)
Organization: Unisys Santa Monica
Lines: 107

In article <557@spar.SPAR.SLB.COM> malcolm@spar.slb.com (Malcolm Slaney) writes:

	The only way to really compare two languages for performance is to lock
	two hackers into two different rooms, feed them equal amounts of
	caffeinated soda (:-) and see which one is faster after a month.

I wonder whether Slaney would still say this if he was locked in a room with
the job of translating dhrystone into Lisp? (:-)  I still don't think this
benchmark is a good one.  But I'll play the game by his rules for a few
seconds.  If minor changes to code are permitted (see the end of this note for
details), then plain Sun C can run the FFT benchmark about a third faster than
Lucid 2.1.1:

	Time (in seconds) to execute 10 iterations of a 1024 point FFT
		Sun-3/160 68881
		single	double
C (SunOS 3.4)	3.5	3.7
Lucid 2.1.1	4.7	?

Slaney also writes:

	... we *were* seeing floating point run 10 times slower than Lisp
	because of the need for boxing (tags) and no type propogation.  I'm
	VERY happy to see that the lisp compilers are improving so much....

I'm also happy to see Lucid Lisp improving, and it's important to say that
floating point need not cause one to shun Lisp.  But I'm not yet convinced that
Lucid Lisp and Sun C have similar floating point performance, even ignoring the
the 35% performance difference reported above.  First, no Lucid times for
FPA-equipped Sun-3s or for Sun-4s were reported; what is the problem here?
Second, many Lisp systems don't support fast double precision, which is crucial
for many applications.  Can the question mark in the table above be replaced by
a hard number, so that we can see how well Lucid handles double precision?

----
The following changes to Slaney's (original) benchmark generate the performance
figures described above.  The changes to lines 178 and 264 fix bugs that don't
affect CPU time -- but they lead me to suspect that there are more bugs!

18d17
< float fft_re[1025], fft_im[1025];
19a19,22
> #ifndef real
> #define real float
> #endif
> real fft_re[1025], fft_im[1025];
31c34
< float areal[], aimag[];
---
> real areal[], aimag[];
47,51c50,55
< 	register float *ar = areal, *ai = aimag;
< 	register int	i = 1, j = 0, k = 0, m = 0; 
< 	int	n = 1024, nv2 = 512, le = 0,
< 		le1 = 0, ip = 0;
< 	float	ur = 0.0, ui = 0.0, wr = 0.0, wi = 0.0, tr = 0.0, ti = 0.0;
---
> 	register int i = 1, ip;
> 	register real *ar = areal, *ai = aimag;
> 	register double r, s, ur, ui, tr, ti;
> 	register int le1, le, n = 1024, j, k, m = 0;
> 	register int nv2 = n>>1;
> 	register double wr, wi;
169,174c173,182
< 				tr = ar[ip]*ur - ai[ip] * ui;
< 				ti = ar[ip]*ui + ai[ip] * ur;
< 				ar[ip] = ar[i] - tr;
< 				ai[ip] = ai[i] - ti;
< 				ar[i] += tr;
< 				ai[i] += ti;
---
> 				r = ar[ip];
> 				s = ai[ip];
> 				tr = r*ur - s*ui;
> 				ti = r*ui + s*ur;
> 				r = ar[i];
> 				s = ai[i];
> 				ar[ip] = r - tr;
> 				ai[ip] = s - ti;
> 				ar[i] = r += tr;
> 				ai[i] = s += ti;
178c186
< 		ti = ur * wi + wi * wr;
---
> 		ui = ur * wi + ui * wr;
180d187
< 		ui = ti;
229c236
<  float	theta, phase;
---
>  double	theta, phase;
231c238
< 	float	f, c, s;
---
> 	double	f, c, s;
237c244
< 		float	x;
---
> 		double	x;
261c268
< 		float	re, im;
---
> 		double	re, im;
264c271
< 		if (abs(re) > fft_delta || abs(im) > fft_delta)
---
> 		if (fabs(re) > fft_delta || fabs(im) > fft_delta)