Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!ubvax!ardent!mac
From: mac@mrk.ardent.com (Michael McNamara)
Newsgroups: comp.arch
Subject: Re: Double Width Integer Multiplication and Division
Message-ID: <MAC.89Jul5102157@mrk.ardent.com>
Date: 5 Jul 89 17:21:57 GMT
References: <1035@aber-cs.UUCP> <1370@l.cc.purdue.edu>
	<1333@sunset.MATH.UCLA.EDU>
Sender: news@ardent.UUCP
Organization: Ardent Computer Corporation, Sunnyvale, CA
Lines: 212
In-reply-to: pmontgom@sonia.math.ucla.edu's message of 3 Jul 89 18:59:23 GMT


	[ An ongoing discussion of the lack of certain interesting
math primitives in HLLs, unease of assembly programming risc chips,
and the need for extensible HLLs ].  

	On the extensible HLL front, why not go to the authors of the
extensible editor? the FSF's C compiler, GCC, has extended asm macro
support which allow you to symbolicly hook up your asm routines to HLL
variables. An excerpt from the gcc manual appears below.  You can
obtain gcc via ftp from a number of sites, as well as via uucp from
osu-cis.

	Note this is a bit of a long posting, but, gcc is USEFUL to
allow mortals to use the whole machine...

>  In article <1333@sunset.MATH.UCLA.EDU> pmontgom@sonia.math.ucla.edu
>  (Peter Montgomery) writes:
>  	Given integers A, B, C where  0 <= A, B, < C,  I want to be able to
>  find q, r such that A*B = q*C + r and 0 <= r < C.  I do multiple-precision
>  arithmetic with large numbers, and this is such an important operation that
>  I cannot afford to call a subroutine every time I do it.
>  So rather than have just a small assembly routine to do this function,
>  I write the entire loop or the entire procedure in assembly code.
>  
>  	I want to be able to define primitives like this in my language,
>  telling the compiler which sequence of instructions to generate whenever it
>  encounters my primitive (this sequence of instructions will be defined
>  ONCE, in the machine dependent part of my program, but the code
>  referencing the primitives will be scattered throughout).  Many languages
>  allow one to define user primitives in terms of other language elements
>  (macros), but few languages allow us to go deeper and say things like (MC68020)
>  
>  	"DEFINE QUOT_REM_64(arg1:unsigned long, register type D;
>  	                    arg2:unsigned long, register type D;
>  			    arg3:unsigned long, register type D),
>  	        RETURNS    (arg4:unsigned long, register type D;
>  			    arg5:unsigned long, register type D);
>  		LOCAL upper: register type D;
>  		LOCAL lower: register type D;
>  			movl  arg1, lower
>  			mulul arg2, upper:lower  /* 64-bit product arg1*arg2 */
>  			divul arg3, upper:lower  /* Divide by arg3 */
>  			movl  lower, arg4	 /* quotient */
>  			movl  upper, arg5	 /* remainder */
>  	END QUOT_REM;"
>  
>  	When the compiler subsequently encounters an expression like 
>  (q, r) := QUOT_REM_64(A, B, C), the compiler would evaluate A, B, and C, 
>  converting them to unsigned long if necessary.  Each time these are 
>  referenced in the body, the values would be moved to a D register and 
>  the appropriate operation done.  The outputs get assigned to q and r. 
>  With a good optimizing compiler, the movl's could probably be eliminated
>  (and the compiler would be allowed to interchange arg1 and arg2 in the 
>  mulul since the instruction is computationally commutative).  The 
>  programmer expresses his algorithm in terms of the available instructions, 
>  while the compiler worries about the things it is good at (e.g., storage 
>  and register allocation, common subexpression recognition, loop invariants).  
>  The body of the definition would be allowed to reference more registers than 
>  are available, with the compiler responsible for handling the overflow.
>  
>  	Note the ASM primitive of C is unsatisfactory, for it forces
>  the programmer to know where the compiler has put the operands.  I once
>  used FORTRAN statement functions on the Control Data 7600 to do
>  double-length integer multiplies (the same hardware instruction was
>  used for the upper half of floating and integer multiplications, and
>  I was able to tell the compiler to treat my original operands
>  as floating point without changing the bit-pattern), but nowhere
>  else have I succeeded. 
>  --------
>          Peter Montgomery
>          pmontgom@MATH.UCLA.EDU 

From the GCC info node, Extended Asm Support:

Assembler Instructions with C Expression Operands
=================================================

In an assembler instruction using `asm', you can now specify the
operands of the instruction using C expressions.  This means no more
guessing which registers or memory locations will contain the data you want
to use.

You must specify an assembler instruction template much like what appears
in a machine description, plus an operand constraint string for each
operand.

For example, here is how to use the 68881's `fsinx' instruction:

     asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));

Here `angle' is the C expression for the input operand while
`result' is that of the output operand.  Each has `"f"' as its
operand constraint, saying that a floating-point register is required.  The
constraints use the same language used in the machine description
(*Note Constraints::).

Each operand is described by an operand-constraint string followed by the C
expression in parentheses.  A colon separates the assembler template from
the first output operand, and another separates the last output operand
from the first input, if any.  Commas separate output operands and separate
inputs.  The number of operands is limited to the maximum number of
operands in any instruction pattern in the machine description.

Output operand expressions must be lvalues; the compiler can check this.
The input operands need not be lvalues.  The compiler cannot check whether
the operands have data types that are reasonable for the instruction being
executed.  It does not parse the assembler instruction template and does
not know what it means, or whether it is valid assembler input.  The
extended `asm' feature is most often used for machine instructions
that the compiler itself does not know exist.

If there are no output operands, and there are input operands, then you
should write two colons in a row where the output operands would go.

The output operands must be write-only; GNU CC will assume that the values
in these operands before the instruction are dead and need not be
generated.  For an operand that is read-write, or in which not all bits are
written and the other bits contain useful information, you must logically
split its function into two separate operands, one input operand and one
write-only output operand.  The connection between them is expressed by
constraints which say they need to be in the same location when the
instruction executes.  You can use the same C expression for both operands,
or different expressions.  For example, here we write the (fictitious)
`combine' instruction with `bar' as its read-only source operand
and `foo' as its read-write destination:

     asm ("combine %2,%0" : "=r" (foo) : "0" (foo), "g" (bar));

The constraint `"0"' for operand 1 says that it must occupy the same
location as operand 0.

Only a digit in the constraint can guarantee that one operand will be in
the same place as another.  The mere fact that `foo' is the value of
both operands is not enough to guarantee that they will be in the same
place in the generated assembler code.  The following would not work:

     asm ("combine %2,%0" : "=r" (foo) : "r" (foo), "g" (bar));

Various optimizations or reloading could cause operands 0 and 1 to be in
different registers; GNU CC knows no reason not to do so.  For example, the
compiler might find a copy of the value of `foo' in one register and
use it for operand 1, but generate the output operand 0 in a different
register (copying it afterward to `foo''s own address).  Of course,
since the register for operand 1 is not even mentioned in the assembler
code, the result will not work, but GNU CC can't tell that.

Unless an output operand has the `&' constraint modifier, GNU CC may
allocate it in the same register as an unrelated input operand, on the
assumption that the inputs are consumed before the outputs are produced.
This assumption may be false if the assembler code actually consists of
more than one instruction.  In such a case, use `&' for each output
operand that may not overlap an input.  *Note Modifiers::.

Some instructions clobber specific hard registers.  To describe this,
write a third colon after the input operands, followed by the names of
the clobbered hard registers (given as strings).  For example, on the vax,

     asm volatile ("movc3 %0,%1,%2"
                   : /* no outputs */
                   : "g" (from), "g" (to), "g" (count)
                   : "r0", "r1", "r2", "r3", "r4", "r5");

Usually the most convenient way to use these `asm' instructions is to
encapsulate them in macros that look like functions.  For example,

     #define sin(x)       \
     ({ double __value, __arg = (x);   \
        asm ("fsinx %1,%0": "=f" (__value): "f" (__arg));  \
        __value; })

Here the variable `__arg' is used to make sure that the instruction
operates on a proper `double' value, and to accept only those
arguments `x' which can convert automatically to a `double'.

Another way to make sure the instruction operates on the correct data type
is to use a cast in the `asm'.  This is different from using a
variable `__arg' in that it converts more different types.  For
example, if the desired type were `int', casting the argument to
`int' would accept a pointer with no complaint, while assigning the
argument to an `int' variable named `__arg' would warn about
using a pointer unless the caller explicitly casts it.

GNU CC assumes for optimization purposes that these instructions have no
side effects except to change the output operands.  This does not mean that
instructions with a side effect cannot be used, but you must be careful,
because the compiler may eliminate them if the output operands aren't used,
or move them out of loops, or replace two with one if they constitute a
common subexpression.  Also, if your instruction does have a side effect on
a variable that otherwise appears not to change, the old value of the
variable may be reused later if it happens to be found in a register.

You can prevent an `asm' instruction from being deleted, moved or
combined by writing the keyword `volatile' after the `asm'.  For
example:

     #define set_priority(x)  \
     asm volatile ("set_priority %0": /* no outputs */ : "g" (x))

It is a natural idea to look for a way to give access to the condition
code left by the assembler instruction.  However, when we attempted to
implement this, we found no way to make it work reliably.  The problem
is that output operands might need reloading, which would result in
additional following "store" instructions.  On most machines, these
instructions would alter the condition code before there was time to
test it.  This problem doesn't arise for ordinary "test" and
"compare" instructions because they don't have any output operands.


--
_________________
Michael McNamara 
  mac@ardent.com