Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!usc!rpi!sarah!bingnews!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.arch
Subject: Re: Divide in 1 cycle (was Re: standard extensions)
Keywords: Reciprocal,Algorithm
Message-ID: <1991Mar10.201021.22058@bingvaxu.cc.binghamton.edu>
Date: 10 Mar 91 20:10:21 GMT
References: <1991Mar7.043931.13552@bingvaxu.cc.binghamton.edu> <777@spim.mips.COM> <1991Mar8.110801.20042@bingvaxu.cc.binghamton.edu>
Organization: State University of New York at Binghamton
Lines: 181

My earlier mumblings about ``interpolation'' & ``table-lookups not being quite 
as expensive as all that'' have resulted in a modest amount of email asking 
for clarification. So after a bit of experimentation I am posting yet 
_another_ reciprocal routine which attempts to illustrate some of the ideas to 
which I alluded.

This present routine uses linear interpolation (after some initial adjustment 
of the argument) to find the reciprocal of an unsigned 16-bit number. It uses 
512 words of table storage.

It is based, again, on an approximate expansion for 1/(L a+b)

	= 1/L 1/a (1 - 1/L 1/a b)

	= 1/L (A[a] - B[a] (1/L b))

	where A[a] = 1/a and B[a] = 1/a^2

It is convenient to choose L = 2^k :-).

Over the argument range the absolute error in the result is small enough to 
make any division accurate (i.e. the largest abs err * largest numerator 
< 1 bit).

Since the scheme only involves
	(c) a `small' multiply
	(a) a subtract
	(b) two `small' shifts
it is slightly :-) cheaper in terms of h/w than the scheme I posted previously.
It again could perform at the rate of one result per cycle with a suitable 
tree multiplier and shifters.

The accuracy of the calculation can be improved somewhat if the parameters 
A[a] and B[a] are displaced; I have tried a simple regression to minimize the 
square error, and that seems satisfactory; but for the numerical purists a 
minimax procedure might be better.

Although obviously a toy, expansion to your preference in wordsize is
straightforward (if expensive).

-kym

Since the abs error was originally largest for a power of two, I have 
included a shift in the program to remove trailing 0's in the initial
argument; this would probably be reduced to a single-bit shift in a hardware 
version, or eliminated entirely if the table entries A[a] and B[a] were 
adjusted appropriately.

The program also includes a possible single Newton-Raphson correction step; 
but it does not seem to be required for the purposes of implementing integer 
division.

C Code ahead.

--- program ---
#include <assert.h>
#include <math.h>
#include <stdio.h>

/*

1/(La+b) = 1/L 1/a (1 + 1/L 1/a b)^-1
	=about 1/L 1/a (1 - 1/L 1/a b)
	= 1/L 1/a - 1/L^2 1/a^2 b
	= 1/L A(a) - 1/L^2 B(a) b

Good if L = 2^k.

*/

#define MAX 0xffff
#define NTAB (1<<8)

float atab[NTAB],btab[NTAB];
int mask[10];
float shift[16];

float recip0(x) {
	int a;
	unsigned short b;
	unsigned char m=0;
	unsigned char n;

/* trim trailing 0's from initial argument */
	for(a=x; (a&1)==0; a>>=1, m++);
/* top `nbits' of x -> a ; rest -> b */
	for(n=0; a>=NTAB; a>>=1, n++);
	assert(n<=10);
	b=x & mask[n];
	m += n;
	assert(m<=15);
	return shift[m]*(atab[a] - shift[n]*btab[a]*b);
	}

#ifdef NEWTON
float recip(x) {
/* perform single Newton-Raphson correction step */
	float r = recip0(x);
	r *= 2 - r*x;
	return r;
	}
#else
/* perform no Newton-Raphson correction step */
#define recip(x) recip0(x)
#endif

main() {
	int x;
	float mine,std;
	float abs_err,rel_err;
	float prod_rel_err=1,sum_abs_err=0;
	float max_abs_err=0,min_abs_err=1e38;
	int min_abs_err_x= -1,max_abs_err_x= -1;
	float max_rel_err=0,min_rel_err=1e38;
	int min_rel_err_x= -1,max_rel_err_x= -1;

	init();

	for(x=1; x<=MAX; x++) {

		std=1.0/x;

		mine=recip(x);
		assert(mine>=0); /* just in case... */

		abs_err=fabs(mine-std);
		sum_abs_err+=abs_err;
		rel_err=abs_err/std;
		prod_rel_err *=rel_err;
		if(abs_err>max_abs_err)
			max_abs_err=abs_err,max_abs_err_x=x;
		else if(abs_err<min_abs_err)
			min_abs_err=abs_err,min_abs_err_x=x;
		if(rel_err>max_rel_err)
			max_rel_err=rel_err,max_rel_err_x=x;
		else if(rel_err<min_rel_err)
			min_rel_err=rel_err,min_rel_err_x=x;

		}

	printf("av abs err=%g ",sum_abs_err/MAX );
	printf("min abs err =%g ",min_abs_err);
	printf("(%d) ",min_abs_err_x);
	printf("max abs err =%g ",max_abs_err);
	printf("(%d) ",max_abs_err_x);
	putchar('\n');
	printf("av rel err=%g ", pow(prod_rel_err,1.0/MAX) );
	printf("min rel err =%g ",min_rel_err);
	printf("(%d) ",min_rel_err_x);
	printf("max rel err =%g ",max_rel_err);
	printf("(%d) ",max_rel_err_x);
	putchar('\n');

	exit(0);
	}

init() {
	int i;
	atab[0] = 0;
	btab[0] = 0;
	for(i=1; i<NTAB; i++) {
		atab[i] = 1.0/i;
		btab[i] = 1.0/((long)i*i);
		}
	for(i=0; i<10; i++) mask[i] = (1L<<i) - 1;
	for(i=0; i<16; i++) shift[i] = 1.0/(1L<<i);
	}


--- make #1 ---
cc recip.c -lm
a.out
--- log file #1 ---
av abs err=5.93951e-08 min abs err =0 (1) max abs err =7.59971e-06 (514)
av rel err=0 min rel err =0 (1) max rel err =0.00732425 (33008)
--- make #2 ---
cc -DNEWTON recip.c -lm
--- log file #2 ---
av abs err=1.51684e-10 min abs err =0 (1) max abs err =2.96859e-08 (514)
av rel err=0 min rel err =0 (1) max rel err =5.36168e-05 (33008)
--- end ---