Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!intelca!mipos3!omepd!jimv
From: jimv@omepd (Jim Valerio)
Newsgroups: comp.lang.c
Subject: Re: Inline assembler; a quiz  (long; sorry)
Message-ID: <829@omepd>
Date: Sun, 21-Jun-87 22:10:52 EDT
Article-I.D.: omepd.829
Posted: Sun Jun 21 22:10:52 1987
Date-Received: Tue, 23-Jun-87 01:34:43 EDT
References: <608@zen.UUCP> <2299@hoptoad.uucp> <21211@sun.uucp> <464@winchester.UUCP>
Reply-To: jimv@omepd.UUCP (Jim Valerio)
Organization: Intel Corp., Hillsboro
Lines: 332

This is a paraphrase and clarification of some mail that I sent to John Mashey
which he urged me to post to the net.

In article <464@winchester.UUCP> mash@winchester.UUCP (John Mashey) argues
against inline assembly language:
>And finally, "asm" and global-optimizing compilers are fairly
>contradictory.  A good optimizer:
>	a) Mostly ignores register declarations. It allocates
>	registers as appropriate.  The same variable may well appear
>	in several different places during the code.
>	b) May, given slightly different source code, rearrange the
>	register use substantially.
>	c) Will find it VERY hard to figure out what an arbitrary
>	"asm statement" is doing, in terms of side-effects.

I don't believe that inline assembly language and globally optimizing compilers
need to be contradictory.  In particular, the compiler I've been using here
implements inline assembly lanuage in such a way that (a) and (b) are transparent,
and (c) is not a problem in practice.

When I say inline assembly language what I mean is not the the usage found, for
example, in 4.3bsd Unix, where an arbitrary string is randomly inserted between C
statements.  The compiler I've been using here implements inline assembly language
functions using a construction similar to the asm function declarations found in
the PCC2 (?) C compiler.  A sample of our syntax and usage can be found at the end
of this article.

Basically, what the programmer does is declare a function in the usual C
way, but with the special storage class "asm".  The body of the function is coded
as a set of assembly language templates, one of which is selected by the compiler
depending on the storage class of the operands and results.
Registers, temporary storage, local labels, and so on are declared by name,
and the compiler just uses it's regular allocation schemes and naming conventions
to generate the true names.

The programmer is expected to follow certain sensible rules when writing the code
templates.  For example, the source operands to the function may not be modified
unless the compiler matches a template saying that the operand is a temporary.
Similarly, the compiler has sensible rules to follow when generating code that
interacts with code in asm functions.  For example, in the case that the compiler
does not or chooses not to recognize the side-effects of the instructions in the
selected asm template, it makes a worst-case assumption about side effects and
performs the cleanup actions it would do if a subroutine call were emitted.


I've found this asm function facility useful in a variety of situations.
I don't use it for inline functions that I can write in C, since we have
a separate facility for that (thanks, Steve).  Instead, I use it in the
following ways (and more).

1)  I've used it most in our floating-point math library, to cause calls
    to some basic floating-point function to emit the processor's equivalent
    instruction inline.

2)  I've used it to get at processor control registers, such as examining
    floating-point exception flags.

3)  I've used it to do double-precision and quad-precision integer arithmetic
    by defining the simple asm functions that get at the add-with-carry
    sorts of instructions, and the extended multiply and divide instructions.

4)  Last, I've found the asm functions to be useful for generating particular
    test cases for odd instruction combinations or instructions not normally
    generated by our compiler; it allows me to do most of the work using our
    regular programming environment, and diving down to the assembly language
    only for that particular case I care about.

Thus, in 3 cases the benefit is efficiency without sacrificing the advantages
of writing most of the code in C, and in the other case the benefit is
programmer convenience for knock-off programs.

In his mail to me, John argues that these 4 cases are less important to them
because in their implementation:
    a) leaf procedures like these only have a 2-cycle overhead,
    b) in most cases, features that cannot be gotten to from a high-level
       language don't exist,
    c) the compiler would need to know a great deal about the side effects,
       and a function call automatically has the right effects.

In our case, (c) is the same, (a) is nearly the same, and (b) is apparently
a legitimate difference.  (I suspect that the true cost of leaf procedures
is more than 2 cycles here, when you count overhead due to i-cache misses,
software calling conventions required for separate linking, and whatever.)


In conclusion (and despite the reasons John gives), I feel that the inline
assembly language function facility was well worth its implementation cost.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

--- appended example, as promised above ---

/*
 * You will find below fragments of the standard header file that defines
 * the many inline and asm functions used in our floating-point support
 * library, followed by a sample function that uses these (and other)
 * functions.  In the "generic" function fp_exp, there is only 1 true subroutine
 * call: fp_fault.  All the other functions are either inline functions or
 * asm functions.
 */

/*
 * Return the current floating-point environment.
 */
asm
fp_env
fp_getenv(void)
{
%reglit return;
	modac	0,0,return
%error;
}

/*
 * Set the floating-point environment to `env',
 * and return the previous environment.
 */
asm
fp_env
fp_setenv(fp_env env)
{
%reglit return; reglit env;
	ldconst	0xdf1f0000,return
	modac	return,env,return
%error;
}


/*
 * Return the current exception flags.
 */
inline
fp_except
fp_getflags(void)
{
	register fp_env env;

	env = fp_getenv();
	return env.flags;
}

/*
 * Set exception flags to `flags', and return the previous
 * flags.
 */
asm
fp_except
fp_setflags(fp_except flags)
{
/*
 * Optimized cases when return value is ignored.
 */
%void return; const(0) flags; tmpreg mask; /* FPX_NONE */
	ldconst	31<<16,mask
	modac	mask,0,mask

%void return; const flags; tmpreg mask,tflags;
	ldconst	31<<16,mask
	ldconst	flags<<16,tflags
	modac	mask,tflags,mask

%void return; tmpreg flags; tmpreg mask;
	ldconst	31<<16,mask
	shlo	16,flags,flags
	modac	mask,flags,mask

%void return; reglit flags; tmpreg mask,tflags;
	ldconst	31<<16,mask
	shlo	16,flags,tflags
	modac	mask,tflags,mask

/*
 * Same as above, but return value required.
 */
%reglit return; const(0) flags; /* FPX_NONE */
	ldconst	31<<16,return
	modac	return,0,return
	shlo	11,return,return
	shro	27,return,return

%reglit return; const flags; tmpreg tflags;
	ldconst	31<<16,return
	ldconst	flags<<16,tflags
	modac	return,tflags,return
	shlo	11,return,return
	shro	27,return,return

%reglit return; tmpreg flags; tmpreg mask;
	ldconst	31<<16,return
	shlo	16,flags,flags
	modac	mask,flags,mask
	shlo	11,return,return
	shro	27,return,return

%reglit return; reglit flags; tmpreg mask,tflags;
	ldconst	31<<16,return
	shlo	16,flags,tflags
	modac	mask,tflags,mask
	shlo	11,return,return
	shro	27,return,return

%error;
}

/*
 * Restore the previously saved environment `env', adding in whatever
 * masked exceptions (from `flags') that have occurred.
 * The unmasked exceptions (i.e. traps) are returned.
 */
inline
fp_except
fp_restorenv(fp_env env, fp_except flags)
{
	register fp_except masks;

	masks = env.masks;
	env.flags |= flags & masks;
	fp_setenv(env);
	return (flags &~ masks);
}

/*
 * Scale `x' by `i' (i.e. multiply x by 2^i).
 */
asm
float
fps_scalb(float x, int i)
{
%reglit return; reglit x, i;
	scaler	i,x,return
%error;
}

asm
double
fpd_scalb(double x, int i)
{
%reglit(2) return; reglit(2) x; reglit i;
	scalerl	i,x(0),return(0)
%error;
}


/*
 * The exponential function e^x is approximated using the mathematical
 * identity:
 *	e^x = 2^(log2(e) * x)
 * Since the underlying approximation function is 2^f - 1, for abs(f) <= 0.5,
 * the actual algorithm used is
 *	e^x = scale( (2^f - 1) + 1, I)
 * where (log2(e) * x) = I+f and I is the (IEEE) nearest integer to
 * (log2(e) * x).
 */
GENERIC
fp_exp(GENERIC x)
{
	register GENERIC r;
	register fp_except traps;
	register fp_env env;

	switch (fp_class(x)) {

	case FPC_POSZERO:
	case FPC_NEGZERO:
		return (GENERIC)1.0;

	case FPC_POSINF:
		return x;

	case FPC_NEGINF:
		return (GENERIC)0.0;

	case FPC_POSQNAN:
	case FPC_NEGQNAN:
	case FPC_POSSNAN:
	case FPC_NEGSNAN:
		env = fp_setenv(FP_DEFENV);
		r = fp_nan1(x); /* propagate NaN */
		break;

	case FPC_POSNORM:
	case FPC_NEGNORM:
	case FPC_POSDENORM:
	case FPC_NEGDENORM:
		env = fp_setenv(FP_DEFENV);

		/*
		 * We make a range check here to avoid two different overflow
		 * conditions.  If x is a very large extended precision number,
		 * then x*log2(e) can overflow to infinity, which will then
		 * precipitate an invalid operation exception when computing
		 * ex-ei.  The more likely overflow avoided by this check is
		 * when ei is too large to fit in an integer.  In no case
		 * should exp() ever signal integer overflow.
		 */
		if (fp_abs(x) < (GENERIC)65536.0) {
			register long double ex, ei;

			/*
			 * Two subtle points here:
			 * (1) Exp2m1() might generate a spurious underflow
			 *     when ei = 0.  The spurious flag must be cleared.
			 *     The true underflow (and overflow) indication
			 *     comes from the scale() operation.
			 * (2) The inexact exception will always be signaled
			 *     because either the multiplication or the round()
			 *     operation (and usually both) will signal inexact.
			 */
			ex = x * log2_e;
			ei = fpe_round(ex);
			r = (GENERIC)1.0 + fpe_exp2m1(ex - ei);
			fp_clrflags(FPX_UNFL);
			r = fp_scalb(r, (int)ei);
		} else {
			/*
			 * When x >= 2^16, overflow is certain.
			 * When x <= -2^16, underflow is certain.
			 */
			if (x > 0) {
				fp_setflags(FPX_OVFL | FPX_INEX);
				r = fp_posinf;
			} else {
				fp_setflags(FPX_UNFL | FPX_INEX);
				r = (GENERIC)0.0;
			}
		}
	}
	traps = fp_restorenv(env, fp_getflags());
	if (traps == FPX_NONE)
		return r;
	return fp_fault(FPSL_EXP, traps, r, x);
}