Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!elroy.jpl.nasa.gov!swrinde!ucsd!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.arch Subject: Re: bizarre instructions Message-ID: <10244@dog.ee.lbl.gov> Date: 25 Feb 91 19:27:07 GMT References: <9102220245.AA14853@ucbvax.Berkeley.EDU> <1991Feb25.134714.23523@linus.mitre.org> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 118 X-Local-Date: Mon, 25 Feb 91 11:27:08 PST In article <1991Feb25.134714.23523@linus.mitre.org> bs@gauss.mitre.org (Robert D. Silverman) writes: >In article <9102220245.AA14853@ucbvax.Berkeley.EDU> JBS@IBM.COM writes: >... what one usually wants is (A*B + C)/D and (A*B + C) mod D. >Even on machines that support double length integer multiplies, one >cannot put the above operations into HLL because the compiler will not >generate the double length multiply (say 32 x 32 --> 64) nor will it >then do the (64 /32 --> 32 bit quotient & remainder). Since A*B can overflow >32 bits one is FORCED to call assembler routines to do this. Ah yes. Clearly the following does not work.... /* * return quotient and remainder from (a*b + c) divrem d */ #if 0 /* * This is the way we would like to do it, but gcc emits one extra * instruction, as it is not smart enough to completly eliminate the * addressing on r (it uses a register for r, rather than a pointer, * but never quite goes all the way). */ static __inline int divrem(int a, int b, int c, int d, int *r) { int q; double tmp; /* force reg pair allocation */ asm("emul %1,%2,%3,%0" : "=g"(tmp) : "g"(a), "g"(b), "g"(c)); asm("ediv %3,%2,%0,%1" : "=g"(q), "=g"(*r) : "r"(tmp), "g"(d)); return q; } #else /* * So instead we will use rather peculiar gcc syntax. * Note that the macro uses a, b, c, d, q, and r exactly once each, * and thus side effects (*p++, etc.) are safe. */ #define divrem(q, r, a, b, c, d) ({ \ double divrem_tmp; \ asm("emul %1,%2,%3,%0" : "=g"(divrem_tmp) : \ "g"(a), "g"(b), "g"(c)); \ asm("ediv %3,%2,%0,%1" : "=g"(q), "=g"(r) : \ "r"(divrem_tmp), "g"(d)); \ }) #endif int a[100], b[100], c[100], d[100]; int q[100], r[100]; void doit(int n) { int i; for (i = 0; i < n; i++) { #if 0 q[i] = divrem(a[i], b[i], c[i], d[i], &r[i]); #else divrem(q[i], r[i], a[i], b[i], c[i], d[i]); #endif } } But wait! Maybe, just *maybe*, we should try it out before dismissing it. Well goll-ee, it seems to work! When compiled on a Tahoe (the Tahoe is a `RISC'---a `Reused Instruction Set Computer'; its emul and ediv are just like those on the VAX) with `gcc -O -S' this compiles to (compiler comments and other drek stripped): _doit: .word 0x3c0 movl 4(fp),r4 clrl r2 cmpl r2,r4 jgeq L13 movab _a,r9 movab _b,r8 movab _c,r7 movab _q,r6 movab _r,r5 movab _d,r3 L12: emul (r9)[r2],(r8)[r2],(r7)[r2],r0 ediv (r3)[r2],r0,(r6)[r2],(r5)[r2] incl r2 cmpl r2,r4 jlss L12 L13: ret (Note that the Tahoe does not have auto-increment addressing modes, and this is in fact the best that can be done.) On the VAX the loop changes to (gcc 1.37.1, -fstrength-reduce -mgnu): L12: emul (r2)+,(r3)+,(r4)+,r0 # the registers ediv (r5),r0,r1,r0 # are allocated in movl r1,(r6)+ # a different order. movl r0,(r7)+ addl2 $4,r5 jaoblss r9,r8,L12 Apparently the machine-dependent part has not been taught to combine `ediv' properly; it should be: L12: emul (r2)+,(r3)+,(r4)+,r0 ediv (r5)+,r0,(r6)+,(r7)+ jaoblss r9,r8,L12 A bit of work on vax.md should fix it. This has its drawbacks: the syntax is distinctly un-pretty, and it requires gcc, and it is machine-dependent. It does, however, work. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov Brought to you by Super Global Mega Corp .com