Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!orion.cf.uci.edu!uci-ics!ucla-cs!math.ucla.edu!sonia!pmontgom From: pmontgom@sonia.math.ucla.edu (Peter Montgomery) Newsgroups: comp.arch Subject: Re: Double Width Integer Multiplication and Division Message-ID: <1333@sunset.MATH.UCLA.EDU> Date: 3 Jul 89 18:59:23 GMT References: <1035@aber-cs.UUCP> <1370@l.cc.purdue.edu> Sender: news@MATH.UCLA.EDU Reply-To: pmontgom@math.ucla.edu (Peter Montgomery) Organization: UCLA Mathematics Department Lines: 67 In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes: >> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu >> (Lawrence Crowl) writes: >> >> [1] You are not supposed to write assembler programs on a RISC >> machine. The compilers have sophisticated algorithms to generate >> "optimal" multiplication/division out of simpler instructions. > >I find this attitude arrogant. Neither you nor anyone else knows what >complicated operations I want to do. Why should you try to make it >difficult for me to use the power of the computer? > Given integers A, B, C where 0 <= A, B, < C, I want to be able to find q, r such that A*B = q*C + r and 0 <= r < C. I do multiple-precision arithmetic with large numbers, and this is such an important operation that I cannot afford to call a subroutine every time I do it. So rather than have just a small assembly routine to do this function, I write the entire loop or the entire procedure in assembly code. I want to be able to define primitives like this in my language, telling the compiler which sequence of instructions to generate whenever it encounters my primitive (this sequence of instructions will be defined ONCE, in the machine dependent part of my program, but the code referencing the primitives will be scattered throughout). Many languages allow one to define user primitives in terms of other language elements (macros), but few languages allow us to go deeper and say things like (MC68020) "DEFINE QUOT_REM_64(arg1:unsigned long, register type D; arg2:unsigned long, register type D; arg3:unsigned long, register type D), RETURNS (arg4:unsigned long, register type D; arg5:unsigned long, register type D); LOCAL upper: register type D; LOCAL lower: register type D; movl arg1, lower mulul arg2, upper:lower /* 64-bit product arg1*arg2 */ divul arg3, upper:lower /* Divide by arg3 */ movl lower, arg4 /* quotient */ movl upper, arg5 /* remainder */ END QUOT_REM;" When the compiler subsequently encounters an expression like (q, r) := QUOT_REM_64(A, B, C), the compiler would evaluate A, B, and C, converting them to unsigned long if necessary. Each time these are referenced in the body, the values would be moved to a D register and the appropriate operation done. The outputs get assigned to q and r. With a good optimizing compiler, the movl's could probably be eliminated (and the compiler would be allowed to interchange arg1 and arg2 in the mulul since the instruction is computationally commutative). The programmer expresses his algorithm in terms of the available instructions, while the compiler worries about the things it is good at (e.g., storage and register allocation, common subexpression recognition, loop invariants). The body of the definition would be allowed to reference more registers than are available, with the compiler responsible for handling the overflow. Note the ASM primitive of C is unsatisfactory, for it forces the programmer to know where the compiler has put the operands. I once used FORTRAN statement functions on the Control Data 7600 to do double-length integer multiplies (the same hardware instruction was used for the upper half of floating and integer multiplications, and I was able to tell the compiler to treat my original operands as floating point without changing the bit-pattern), but nowhere else have I succeeded. -------- Peter Montgomery pmontgom@MATH.UCLA.EDU