Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!ncar!midway!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.arch Subject: gcc and 80386 code (was Let's pretend) Keywords: Intel, 586, windows Message-ID: <28773@mimsy.umd.edu> Date: 24 Dec 90 13:59:05 GMT References: <3068@crdos1.crd.ge.COM> <1990Dec19.223934.1568@kithrup.COM> <1990Dec21.031846.5444@kithrup.COM> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 55 >In article <5874@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes: >>Is this [awful 386 code for I/O] for real? In article <1990Dec21.031846.5444@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >This code is very much for real, and was generated by a very good compiler: >gcc 1.37.1 (with a couple of modifications). Two points: - Whether gcc is `good' depends greatly on the amount of work that has been put into the machine dependent code generator. I have no idea what this is for the 386. The VAX code generator falls down in a few areas, e.g., cleaning up after `&=~' operations: a &= ~(1 << f()); generates a sequence of the form calls $0,_f # r0 = f() ashl r0,$1,r0 # r0 = 1 << f() mcoml r0,r0 # r0 = ~(1 << f) mcoml a,r1 # r1 = ~a bicl3 r1,r0,_a # a = r0 & ~r1 (= r0 & ~~a = r0 & a) rather than the optimal calls $0,_f ashl r0,$1,r0 # r0 = 1 << f() bicl2 r0,_a # a &= ~r0 And, considerably more important for this particular example, - gcc 1.x optimization across inline functions and asm() constructs is horrid. gcc's common subexpression eliminator needs to be replaced; this is in progress. Its inline expander needs to be run earlier, at parse time or initial RTL generation, not after initial code generation, even if a post-code-generation phase is retained. (The reason for this latter is to expand routines in place when they are sufficiently short. Short source can compile to surprisingly long object code. By doing initial code generation before inline expansion, you can catch this; however, you lose all cse and constant propagation. Clearly routines *marked* `inline' should be expanded in line early.) Optimizing across asm() is considerably harder. RMS is rumoured to be working on a `little language' for describing the effects of certain asm()s. The problem is that asm can do anything the machine can do, and it is almost impossible to characterise some instructions (how would you describe `rep cmpsb' to a compiler?---`if condition code bit Z is set, then the registers are this way, otherwise they are that way': this is the sort of thing human coders do for memcmp() routines that makes this tricky). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris