Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!orion.cf.uci.edu!uci-ics!ucla-cs!math.ucla.edu!sonia!pmontgom
From: pmontgom@sonia.math.ucla.edu (Peter Montgomery)
Newsgroups: comp.arch
Subject: Re: Double Width Integer Multiplication and Division
Message-ID: <1333@sunset.MATH.UCLA.EDU>
Date: 3 Jul 89 18:59:23 GMT
References: <1035@aber-cs.UUCP> <1370@l.cc.purdue.edu>
Sender: news@MATH.UCLA.EDU
Reply-To: pmontgom@math.ucla.edu (Peter Montgomery)
Organization: UCLA Mathematics Department
Lines: 67

In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>> (Lawrence Crowl) writes:
>> 
>> 	[1] You are not supposed to write assembler programs on a RISC
>> 	    machine. The compilers have sophisticated algorithms to generate
>> 	    "optimal" multiplication/division out of simpler instructions.
>
>I find this attitude arrogant.  Neither you nor anyone else knows what 
>complicated operations I want to do.  Why should you try to make it 
>difficult for me to use the power of the computer?
>
	Given integers A, B, C where  0 <= A, B, < C,  I want to be able to
find q, r such that A*B = q*C + r and 0 <= r < C.  I do multiple-precision
arithmetic with large numbers, and this is such an important operation that
I cannot afford to call a subroutine every time I do it.
So rather than have just a small assembly routine to do this function,
I write the entire loop or the entire procedure in assembly code.

	I want to be able to define primitives like this in my language,
telling the compiler which sequence of instructions to generate whenever it
encounters my primitive (this sequence of instructions will be defined
ONCE, in the machine dependent part of my program, but the code
referencing the primitives will be scattered throughout).  Many languages
allow one to define user primitives in terms of other language elements
(macros), but few languages allow us to go deeper and say things like (MC68020)

	"DEFINE QUOT_REM_64(arg1:unsigned long, register type D;
	                    arg2:unsigned long, register type D;
			    arg3:unsigned long, register type D),
	        RETURNS    (arg4:unsigned long, register type D;
			    arg5:unsigned long, register type D);
		LOCAL upper: register type D;
		LOCAL lower: register type D;
			movl  arg1, lower
			mulul arg2, upper:lower  /* 64-bit product arg1*arg2 */
			divul arg3, upper:lower  /* Divide by arg3 */
			movl  lower, arg4	 /* quotient */
			movl  upper, arg5	 /* remainder */
	END QUOT_REM;"

	When the compiler subsequently encounters an expression like 
(q, r) := QUOT_REM_64(A, B, C), the compiler would evaluate A, B, and C, 
converting them to unsigned long if necessary.  Each time these are 
referenced in the body, the values would be moved to a D register and 
the appropriate operation done.  The outputs get assigned to q and r. 
With a good optimizing compiler, the movl's could probably be eliminated
(and the compiler would be allowed to interchange arg1 and arg2 in the 
mulul since the instruction is computationally commutative).  The 
programmer expresses his algorithm in terms of the available instructions, 
while the compiler worries about the things it is good at (e.g., storage 
and register allocation, common subexpression recognition, loop invariants).  
The body of the definition would be allowed to reference more registers than 
are available, with the compiler responsible for handling the overflow.

	Note the ASM primitive of C is unsatisfactory, for it forces
the programmer to know where the compiler has put the operands.  I once
used FORTRAN statement functions on the Control Data 7600 to do
double-length integer multiplies (the same hardware instruction was
used for the upper half of floating and integer multiplications, and
I was able to tell the compiler to treat my original operands
as floating point without changing the bit-pattern), but nowhere
else have I succeeded. 
--------
        Peter Montgomery
        pmontgom@MATH.UCLA.EDU