Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!ut-sally!utah-cs!utah-gr!stride!l5comp!scotty From: scotty@l5comp.UUCP (Scott Turner) Newsgroups: comp.sys.amiga Subject: Re: Manx C Message-ID: <325@l5comp.UUCP> Date: Mon, 17-Aug-87 04:17:53 EDT Article-I.D.: l5comp.325 Posted: Mon Aug 17 04:17:53 1987 Date-Received: Tue, 18-Aug-87 02:27:54 EDT References: <4540@jade.BERKELEY.EDU> <1836@vax135.UUCP> Reply-To: scotty@l5comp.UUCP (Scott Turner) Organization: L5 Computing, Edmonds, WA Lines: 55 Keywords: read the manual carefully Summary: Let's optimize it right shall we? Yes, movea.{w,l} ea,a{0..7} doesn't modify the condition codes. But deciding "Hey I can fix that by shoving it in a data register" doesn't always work out well in an optimizing compiler either. Consider that what you have done is place the value in question into TWO registers, thus using twice the number of registers needed. Registers are valuable to say the least. Dr. Wirth in his recent (well kinda recent :) paper on the subject of compiler code generation vs modern CPU's makes the same mistake that alot of people make in studying code generation. They look at distinct pieces of code with nothing in front of them and nothing after them. In the real world this happens very very very rarely. Let's take for example that lovely if (!=ptr) example: >> movea.l ,Ax >> cmpa.l #,Ax >> bne not_equal This is what brought on the discussion about slam dancing a data register to set the condition codes. What's wrong with this code? First, may ALREADY BE IN Ax! If the compiler is truely doing a good job of optimization then there is indeed a very real probability that Ax is already loaded with . Reloading it would be a waste. Second, the BEST optimizations often come from careful analysis of the manual for the CPU. This is because CPU designers often slide in little things to help out in code generation. In this case is 0. Most compilers take a look at the right hand side and say "Whoa! That's an Address register, dem only comes in 32 bits." However, upon careful analysis of what Motorola really has to say we find that on a cmpa.w #,A{0..7} is sign extended to 32 bits BEFORE it is compared to the 32 bits in A{0..7}. Thus for 's of range 0..32767 cmpa.w can be used rather than cmpa.l. You'd be amazed how many compilers don't use the cmpa.w in these cases. Third, in a highly optimizing compiler there is always a continous running record kept of WHAT is in each register. It may be that there is a 0 already loaded up in a register just waiting to be used to check out our address register. Also, many of these compilers go one step further and keep track of the use of immediate values. They will often then review this list in order to load the most often used immediate values into registers. Sure even a cmpa.{w,l} D{0..7},A{0..7} takes two cycles longer than the movea.l D{0..7},A{0..7} which is the eqv opcode for timing comparisons against the "Slam dance a Dreg" mode. BUT, two cycles is pretty cheap compared to how many cycles it takes to load that poor Dreg after it gets danced on. Scott Turner -- UUCP-stick: stride!l5comp!scotty | If you want to injure my goldfish just make UUCP-auto: scotty@l5comp.UUCP | sure I don't run up a vet bill. GEnie: JST | "The bombs drop in 5 minutes" R. Reagan "Pirated software? Just say *NO*!" S. Turner