Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!ncar!midway!mimsy!chris
From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.arch
Subject: gcc and 80386 code (was Let's pretend)
Keywords: Intel, 586, windows
Message-ID: <28773@mimsy.umd.edu>
Date: 24 Dec 90 13:59:05 GMT
References: <3068@crdos1.crd.ge.COM> <1990Dec19.223934.1568@kithrup.COM> <1990Dec21.031846.5444@kithrup.COM>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 55

>In article <5874@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes:
>>Is this [awful 386 code for I/O] for real?

In article <1990Dec21.031846.5444@kithrup.COM> sef@kithrup.COM
(Sean Eric Fagan) writes:
>This code is very much for real, and was generated by a very good compiler:
>gcc 1.37.1 (with a couple of modifications).

Two points:

 - Whether gcc is `good' depends greatly on the amount of work that has
   been put into the machine dependent code generator.  I have no idea
   what this is for the 386.  The VAX code generator falls down in a few
   areas, e.g., cleaning up after `&=~' operations:

	a &= ~(1 << f());

   generates a sequence of the form

	calls	$0,_f		# r0 = f()
	ashl	r0,$1,r0	# r0 = 1 << f()
	mcoml	r0,r0		# r0 = ~(1 << f)
	mcoml	a,r1		# r1 = ~a
	bicl3	r1,r0,_a	# a = r0 & ~r1 (= r0 & ~~a = r0 & a)

   rather than the optimal

	calls	$0,_f
	ashl	r0,$1,r0	# r0 = 1 << f()
	bicl2	r0,_a		# a &= ~r0

And, considerably more important for this particular example,

 - gcc 1.x optimization across inline functions and asm() constructs is
   horrid.  gcc's common subexpression eliminator needs to be replaced;
   this is in progress.  Its inline expander needs to be run earlier, at
   parse time or initial RTL generation, not after initial code generation,
   even if a post-code-generation phase is retained. (The reason for this
   latter is to expand routines in place when they are sufficiently short.
   Short source can compile to surprisingly long object code.  By doing
   initial code generation before inline expansion, you can catch this;
   however, you lose all cse and constant propagation.  Clearly routines
   *marked* `inline' should be expanded in line early.)

   Optimizing across asm() is considerably harder.  RMS is rumoured to
   be working on a `little language' for describing the effects of
   certain asm()s.  The problem is that asm can do anything the machine
   can do, and it is almost impossible to characterise some instructions
   (how would you describe `rep cmpsb' to a compiler?---`if condition
   code bit Z is set, then the registers are this way, otherwise they
   are that way': this is the sort of thing human coders do for memcmp()
   routines that makes this tricky).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris