Path: utzoo!mnetor!uunet!husc6!rutgers!sdcsvax!ucbvax!decvax!decwrl!spar!hunt
From: hunt@spar.SPAR.SLB.COM (Neil Hunt)
Newsgroups: comp.lang.c
Subject: Re: C machine
Message-ID: <91@spar.SPAR.SLB.COM>
Date: 1 Jan 88 21:22:45 GMT
References: <7535@alice.UUCP> <8226@steinmetz.steinmetz.UUCP> <461@auvax.UUCP> <9961@mimsy.UUCP> <166@teletron.UUCP>
Reply-To: hunt@spar.UUCP (Neil Hunt)
Organization: SPAR - Schlumberger Palo Alto Research
Lines: 73

Summary: Justification for 32 bit ints on 68k machines.

In article <166@teletron.UUCP> andrew@teletron.UUCP (Andrew Scott) writes:
>
>[...] Our 68000 compiler has 16 bit shorts, 32 bit longs (which
>make sense) and 32 bit ints (which doesn't always make sense).
>
>A lot of code I've come across uses scratch variables (array indices etc.) of
>type int.  Of course, 32 bit arithmetic must be used.

Since the 68000 has 32 bit registers, there is frequently a penalty
on operations in 16 bits - what does the compiler do about the other
16 bits in the registers ? At least in the Sun compilers, it is very
hard to persuade the compiler not to put an extend `extl dn' instruction after
every load of a short variable into a register and a clear `moveq #0 dn'
instruction before each load of an unsigned short value into a register.

Another (perhaps less defendable) reason is that a lot of code
tends to be rather cavalier about exchanging pointers and ints,
(particularly in function return values, for example),
and a 16 bit int would break all of this code.

>However, the 68000 has
>16 bit divide and multiply instructions, which are *much* faster than the 
>subroutine calls to the 32 bit arithmetic routines.  The case could be made
>that a 16 bit quantity is the "natural" size for arithmetic operations for
>the 68000.

Indeed the 68000/8/10/12 have a 16x16->32 bit multiply instruction,
and a function is required for a long multiply. Note however that
in the case that the operands would have fitted into 16 bits, this
fact is quickly discovered and the short multiply is used instead:

		jsr	lmult	; 20

lmult:					; d0 and d1 are the operands.
		movl	d2,sp@- ; 14
		movl	d0,d2,	;  4
		orl	d1,d2	;  6	; OR all the bits together.
		clrw	d2	;  4	; mask bits 0..15, leaving 16..31.
		tstl	d2	;  4
		bnes	...	;  6	; if 16..31 are not zero, branch to ...
		mulu	d1,d0	; 40	; do the simple multiply
		movl	sp@+,d2	; 12
		rts		; 16

				 126 cycles

This is using 68010 timings, with some assumptions.

We see that, even counting the entire function call overhead, there is only
a factor of 3.1 between the function call and the use of the hardware
instruction directly. Things are perhaps not soo bad !

The sun compiler is also smart enough to recognise when a multiply
by a constant is possible in a 16 bit instruction, and uses it rather
than the function call in these cases.

Finally, the 68020 has three sizes of multiply instructions,
16x16->32, 32x32->32, and 32x32->64; On this machine there is little
penalty in having 32 bit ints, and the other advantages still apply.
A compiler writer aware that any 16/32 bit decision for ints would
apply across all 68k machines would probably not decide upon 16 bits
just because some of the machines are slightly slower on one instruction,
especially when all the machines would have to pay the penalty of
maintaining the high bits in the registers if 16 bits were the decision.

Neil/.

PS: Try using:
	a = (int)((short)x * (short)y);
if you really need that factor of 3 back in the multiply instruction --
On a Sun 2 this generates a `muls' instruction !