Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ihnp4!homxb!whuts!mtune!rutgers!cmcl2!nrl-cmf!ames!sgi!decwrl!granite!jmd From: jmd@granite.dec.com (John Danskin) Newsgroups: comp.arch Subject: Re: taken -vs- untaken branches, Fortran FREQUENCY declaration Message-ID: <180@granite.dec.com> Date: 26 Jan 88 21:20:16 GMT References: <839@ima.ISC.COM> <2158@geac.UUCP> <604@bnr-rsc.UUCP> Reply-To: jmd@granite.UUCP (John Danskin) Organization: DEC Workstation Systems Engineering Lines: 58 In article <604@bnr-rsc.UUCP> tak@bnr-rsc.UUCP (Mike Takefman) writes: >In article <2158@geac.UUCP> john@geac.UUCP (John Henshaw) writes: >. 1. Design, code and test your program. >. 2. Compile it for "branch profiling". >. 3. Execute the program on a dataset large enough that there is >. sufficient confidence that the program's execution is truly >. representative of "standard production activity". >. 4. Recompile the program. This recompilation should use the >. information gained from step 3. >.There are a few assumptions here :-). I think they're obvious. > >One assumption that may not be obvious but is important is that this >method is highly intrusive. This method will not work well for a >real time application. > >For vanilla programming I fully endorse this view, but I believe >that the challenge is in doing this non-intrusively. What do you mean by intrusive? Do you mean that the optimizing process can be unpredictable? For a given set of inputs, a real time program can exibit two speed related characteristics: 1) Fast enough. 2) Not fast enough. If the program is fast enough for some usual case, but not fast enough for some exception, then there are three options: 1) Profile with more instances of the exceptional behaviour to try to pump up the performance of those cases, at the expense of the other cases. This may require several iterations, and may expose a design problem, namely that given the state of compiler technology embodied in your compiler, there is no way to jiggle branch frequencies (and global register allocation characteristics etc.) there is no way to make all of the code sequences fast enough. 2) Redo the parts that aren't fast enough in assembler. 3) Change your algorithms so there is more time. 4) Get an optimizing compiler that understands real time constraints: you say: this subroutine, given these sets of inputs, must complete in xxx microseconds. you would probably give these directives to the compiler in a meta language that you used while profiling. Of course, this last option would be a lot of work 8-{). -- John Danskin | decwrl!jmd DEC Workstation Systems Engineering | (415)853-6724 100 Hamilton Avenue | My comments are my own. Palo Alto, CA 94306 | I do not speak for DEC.