Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!rpi!sarah!bingnews!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.lang.c
Subject: Re: Log Library - How is it done in the library code?
Keywords: log, library, series expansion
Message-ID: <1991Mar16.201655.6104@bingvaxu.cc.binghamton.edu>
Date: 16 Mar 91 20:16:55 GMT
References: <15438@smoke.brl.mil> <1991Mar12.014416.4289@m.cs.uiuc.edu> <702@newave.UUCP>
Organization: State University of New York at Binghamton
Lines: 145

In article <702@newave.UUCP> john@newave.mn.org (John A. Weeks III) writes:
>In article <1991Mar12.014416.4289@m.cs.uiuc.edu> joshi@m.cs.uiuc.edu (Anil Joshi) writes:
[complains about library logs]
>As an experiment, why not try running your program using log, then
>try one of the alternate solutions.  Compare the time it takes to run
>each example.  Then post your results to the net.

While I would generally agree with John -- that micro-optimization of
one or two instructions or function calls seldom (if ever) improve the
overall performance of typical programs -- I'm going to play devil's
advokate here.

The following is a simple-minded routine that uses a table lookup to
comupte an approximate natural log for 16-bit integers. It could be
easily recoded to get logs of doubles or floats, and additionally you
might like to remove the floating-point operations and replace them by
fixed-point. (In fact in a similar problem posted in comp.arch
regarding division, speeds for a similar routine approached almost twice
the speed of the FLOATING POINT HARDWARE on some machines). Another
point to consider -- the initial `find first 1' logic might be coded as
a single instruction on some machines.

The results vary over different machines -- some with fp hardware;
some with `no' hardware :-). 

The results are as follows:

Sun 3/60:	0.368515
Vax 8530:	0.421875
Sun SS1:	.6
MIPS Magnum:	1
Vax 6440:	1.64706

So we see that on _some_ hardware (like 68k's) the library routines are
at an apparent _big_ disadvantage; on some other platforms (the 6440
running VMS) the situation is reversed. I particularly like the MIPS
where things everything's equal. :-)

Although I realize I'm comparing apples to oranges here (my log routine
only handles 16-bit integers; not the whole spectrum of double-precision
arguments as does the C library routine) -- that _was_ the point of the
original posting: can we ever take advantage of the fact that what we
wish to do is somewhat less general than the library designer's
intended? As John said in his post referenced above, the answer is a
categorical ``it depends''. You be the judge from the figures above.

-kym

C program follows:

==================== log.c ====================
#define NDEBUG

#include <assert.h>
#include <stdio.h>
#include <math.h>
/*

Taylor series:

f(x+h) = f(x) + h f'(x) + 1/2 h^2 f''(x) + ...

For f(x) = log(x) we have:

f(x) = log(x)
f'(x) = 1/x
f''(x) = -1/x^2
f'''(x) = 2/x^3

Evaluate each of f's at x=1.

f(1) = 0
f'(1) = 1
f''(1) = -1
f'''(1) = 2

Therefore:

log(1+x) =
	= 0 + h 1 + 1/2 h^2 (-1) + 1/6 h^3 (2) + 1/24 h^4 (-6) ...
	= h - h^2/2 + h^3/3 - h^4/4 + ...

If we wish to find:

log(L a + b) = log(La (1 + b/(La))) = log(L) + log(a) + log(1+b/(La))

where L = 2^k we have:

log(2^k) = k log(2)

and therefore:

log(x) = k*log(2) + log(a) + log(1+b/(La))

But as we know
	log(1+b/(La)) =about b/(La) - 1/2 (b/(La))^2

therefore
log(x) =about k/lg(2) + log(a) + 1/L b/a - 1/2 1/L^2 (b/a)^2

*/

#define NTAB 256	/* size of table */
#define NSAMP 10000L	/* number of samples in test */

double atab[NTAB];

init(){
	int i;

	for(i=1;i<NTAB;i++) atab[i]=log((double)i*NTAB);
	}

double mylog(x) register x; {
	register a,b,n;

	for(n=0; x<NTAB*NTAB/2; x<<=1,n++);
	a=x/NTAB;
	b=x%NTAB;
	assert(a>=0);
	assert(a<NTAB);
	return atab[a] + (double)b/(a*NTAB) - n*M_LN2;
	}

main(){
	double sum1,sum2;
	double t0,t1,t2;
	int x;

	init();
	t0=clock();

	sum1=0;
	for(x=1;x<NSAMP;x++) sum1+=log((double)x);
	t1=clock();

	sum2=0;
	for(x=1;x<NSAMP;x++) sum2+=mylog(x);
	t2=clock();

	printf("%g	%g\n",sum1,sum2);
	printf("%g\n",(t2-t1)/(t1-t0));
	exit(0);
	}
==================== END END END ====================