Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!accuvax.nwu.edu!tank!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c Subject: defeating the optimiser (was Ambiguous C?) Message-ID: <17195@mimsy.UUCP> Date: 30 Apr 89 05:34:23 GMT References: <111@ssp1.idca.tds.philips.nl> <17133@mimsy.UUCP> <10136@smoke.BRL.MIL> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 65 >In article <17133@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>If your compiler does not understand `volatile', and has no way to >>disable optimisation, you are out of luck. (You can resort to assembly >>language subroutines.) In article <10136@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >Back, back! (Making the sign of the cross.) No need to resort to >assembly language for something so simple. > >What is the real problem here? It's that the compiler knows that >we only need to inspect one byte in order to determine the state of >the bit. So how do we outwit the compiler? ... [various suggestions deleted] As someone else has already pointed out, this approach leads to the dreaded Compiler Upgrade Problem. The next release of the compiler may require you to change all of your defeat mechanisms. As it happens, though, you can usually get away with only a few small assembly routines---often you need only one for each special instruction. For instance, some Unibus devices respond differently to a `bisw2' (r/m/w) instruction than they would to a `movw'(read) ... `movw'(write) sequence. But you need not write an entire driver in assembly. If the compiler will not cooperate, at worst you can write bisw(®, bits); and have the routine _bisw: .globl _bisw .word 0 bisw2 8(ap),*4(ap) ret somewhere callable. Often you can insert this sort of thing directly into the compiler's assembly output (most serious compilers are capable of producing assemblable code, even if their default is to produce object code directly) to avoid subroutine call overhead. Sun provide a program called `inline' that uses this approach, and (I presume) also tries to avoid unnecessary pushes and pops, changing something like pea a4@(12) jsr _readlong movl #10,d1 btst d1,d0 | btst cannot test bit 10 directly plus _readlong: movl sp@(4),a0 movl a0@,d0 rts into lea a4@(12),a0 movl a0@,d0 movl #10,d1 btst d1,d0 or even (if smart enough) merging the lea+movl into one movl. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris