Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!intelca!mipos3!omepd!jimv From: jimv@omepd (Jim Valerio) Newsgroups: comp.lang.c Subject: Re: Inline assembler; a quiz (long; sorry) Message-ID: <829@omepd> Date: Sun, 21-Jun-87 22:10:52 EDT Article-I.D.: omepd.829 Posted: Sun Jun 21 22:10:52 1987 Date-Received: Tue, 23-Jun-87 01:34:43 EDT References: <608@zen.UUCP> <2299@hoptoad.uucp> <21211@sun.uucp> <464@winchester.UUCP> Reply-To: jimv@omepd.UUCP (Jim Valerio) Organization: Intel Corp., Hillsboro Lines: 332 This is a paraphrase and clarification of some mail that I sent to John Mashey which he urged me to post to the net. In article <464@winchester.UUCP> mash@winchester.UUCP (John Mashey) argues against inline assembly language: >And finally, "asm" and global-optimizing compilers are fairly >contradictory. A good optimizer: > a) Mostly ignores register declarations. It allocates > registers as appropriate. The same variable may well appear > in several different places during the code. > b) May, given slightly different source code, rearrange the > register use substantially. > c) Will find it VERY hard to figure out what an arbitrary > "asm statement" is doing, in terms of side-effects. I don't believe that inline assembly language and globally optimizing compilers need to be contradictory. In particular, the compiler I've been using here implements inline assembly lanuage in such a way that (a) and (b) are transparent, and (c) is not a problem in practice. When I say inline assembly language what I mean is not the the usage found, for example, in 4.3bsd Unix, where an arbitrary string is randomly inserted between C statements. The compiler I've been using here implements inline assembly language functions using a construction similar to the asm function declarations found in the PCC2 (?) C compiler. A sample of our syntax and usage can be found at the end of this article. Basically, what the programmer does is declare a function in the usual C way, but with the special storage class "asm". The body of the function is coded as a set of assembly language templates, one of which is selected by the compiler depending on the storage class of the operands and results. Registers, temporary storage, local labels, and so on are declared by name, and the compiler just uses it's regular allocation schemes and naming conventions to generate the true names. The programmer is expected to follow certain sensible rules when writing the code templates. For example, the source operands to the function may not be modified unless the compiler matches a template saying that the operand is a temporary. Similarly, the compiler has sensible rules to follow when generating code that interacts with code in asm functions. For example, in the case that the compiler does not or chooses not to recognize the side-effects of the instructions in the selected asm template, it makes a worst-case assumption about side effects and performs the cleanup actions it would do if a subroutine call were emitted. I've found this asm function facility useful in a variety of situations. I don't use it for inline functions that I can write in C, since we have a separate facility for that (thanks, Steve). Instead, I use it in the following ways (and more). 1) I've used it most in our floating-point math library, to cause calls to some basic floating-point function to emit the processor's equivalent instruction inline. 2) I've used it to get at processor control registers, such as examining floating-point exception flags. 3) I've used it to do double-precision and quad-precision integer arithmetic by defining the simple asm functions that get at the add-with-carry sorts of instructions, and the extended multiply and divide instructions. 4) Last, I've found the asm functions to be useful for generating particular test cases for odd instruction combinations or instructions not normally generated by our compiler; it allows me to do most of the work using our regular programming environment, and diving down to the assembly language only for that particular case I care about. Thus, in 3 cases the benefit is efficiency without sacrificing the advantages of writing most of the code in C, and in the other case the benefit is programmer convenience for knock-off programs. In his mail to me, John argues that these 4 cases are less important to them because in their implementation: a) leaf procedures like these only have a 2-cycle overhead, b) in most cases, features that cannot be gotten to from a high-level language don't exist, c) the compiler would need to know a great deal about the side effects, and a function call automatically has the right effects. In our case, (c) is the same, (a) is nearly the same, and (b) is apparently a legitimate difference. (I suspect that the true cost of leaf procedures is more than 2 cycles here, when you count overhead due to i-cache misses, software calling conventions required for separate linking, and whatever.) In conclusion (and despite the reasons John gives), I feel that the inline assembly language function facility was well worth its implementation cost. -- Jim Valerio {verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com --- appended example, as promised above --- /* * You will find below fragments of the standard header file that defines * the many inline and asm functions used in our floating-point support * library, followed by a sample function that uses these (and other) * functions. In the "generic" function fp_exp, there is only 1 true subroutine * call: fp_fault. All the other functions are either inline functions or * asm functions. */ /* * Return the current floating-point environment. */ asm fp_env fp_getenv(void) { %reglit return; modac 0,0,return %error; } /* * Set the floating-point environment to `env', * and return the previous environment. */ asm fp_env fp_setenv(fp_env env) { %reglit return; reglit env; ldconst 0xdf1f0000,return modac return,env,return %error; } /* * Return the current exception flags. */ inline fp_except fp_getflags(void) { register fp_env env; env = fp_getenv(); return env.flags; } /* * Set exception flags to `flags', and return the previous * flags. */ asm fp_except fp_setflags(fp_except flags) { /* * Optimized cases when return value is ignored. */ %void return; const(0) flags; tmpreg mask; /* FPX_NONE */ ldconst 31<<16,mask modac mask,0,mask %void return; const flags; tmpreg mask,tflags; ldconst 31<<16,mask ldconst flags<<16,tflags modac mask,tflags,mask %void return; tmpreg flags; tmpreg mask; ldconst 31<<16,mask shlo 16,flags,flags modac mask,flags,mask %void return; reglit flags; tmpreg mask,tflags; ldconst 31<<16,mask shlo 16,flags,tflags modac mask,tflags,mask /* * Same as above, but return value required. */ %reglit return; const(0) flags; /* FPX_NONE */ ldconst 31<<16,return modac return,0,return shlo 11,return,return shro 27,return,return %reglit return; const flags; tmpreg tflags; ldconst 31<<16,return ldconst flags<<16,tflags modac return,tflags,return shlo 11,return,return shro 27,return,return %reglit return; tmpreg flags; tmpreg mask; ldconst 31<<16,return shlo 16,flags,flags modac mask,flags,mask shlo 11,return,return shro 27,return,return %reglit return; reglit flags; tmpreg mask,tflags; ldconst 31<<16,return shlo 16,flags,tflags modac mask,tflags,mask shlo 11,return,return shro 27,return,return %error; } /* * Restore the previously saved environment `env', adding in whatever * masked exceptions (from `flags') that have occurred. * The unmasked exceptions (i.e. traps) are returned. */ inline fp_except fp_restorenv(fp_env env, fp_except flags) { register fp_except masks; masks = env.masks; env.flags |= flags & masks; fp_setenv(env); return (flags &~ masks); } /* * Scale `x' by `i' (i.e. multiply x by 2^i). */ asm float fps_scalb(float x, int i) { %reglit return; reglit x, i; scaler i,x,return %error; } asm double fpd_scalb(double x, int i) { %reglit(2) return; reglit(2) x; reglit i; scalerl i,x(0),return(0) %error; } /* * The exponential function e^x is approximated using the mathematical * identity: * e^x = 2^(log2(e) * x) * Since the underlying approximation function is 2^f - 1, for abs(f) <= 0.5, * the actual algorithm used is * e^x = scale( (2^f - 1) + 1, I) * where (log2(e) * x) = I+f and I is the (IEEE) nearest integer to * (log2(e) * x). */ GENERIC fp_exp(GENERIC x) { register GENERIC r; register fp_except traps; register fp_env env; switch (fp_class(x)) { case FPC_POSZERO: case FPC_NEGZERO: return (GENERIC)1.0; case FPC_POSINF: return x; case FPC_NEGINF: return (GENERIC)0.0; case FPC_POSQNAN: case FPC_NEGQNAN: case FPC_POSSNAN: case FPC_NEGSNAN: env = fp_setenv(FP_DEFENV); r = fp_nan1(x); /* propagate NaN */ break; case FPC_POSNORM: case FPC_NEGNORM: case FPC_POSDENORM: case FPC_NEGDENORM: env = fp_setenv(FP_DEFENV); /* * We make a range check here to avoid two different overflow * conditions. If x is a very large extended precision number, * then x*log2(e) can overflow to infinity, which will then * precipitate an invalid operation exception when computing * ex-ei. The more likely overflow avoided by this check is * when ei is too large to fit in an integer. In no case * should exp() ever signal integer overflow. */ if (fp_abs(x) < (GENERIC)65536.0) { register long double ex, ei; /* * Two subtle points here: * (1) Exp2m1() might generate a spurious underflow * when ei = 0. The spurious flag must be cleared. * The true underflow (and overflow) indication * comes from the scale() operation. * (2) The inexact exception will always be signaled * because either the multiplication or the round() * operation (and usually both) will signal inexact. */ ex = x * log2_e; ei = fpe_round(ex); r = (GENERIC)1.0 + fpe_exp2m1(ex - ei); fp_clrflags(FPX_UNFL); r = fp_scalb(r, (int)ei); } else { /* * When x >= 2^16, overflow is certain. * When x <= -2^16, underflow is certain. */ if (x > 0) { fp_setflags(FPX_OVFL | FPX_INEX); r = fp_posinf; } else { fp_setflags(FPX_UNFL | FPX_INEX); r = (GENERIC)0.0; } } } traps = fp_restorenv(env, fp_getflags()); if (traps == FPX_NONE) return r; return fp_fault(FPSL_EXP, traps, r, x); }