Path: utzoo!mnetor!uunet!husc6!bloom-beacon!gatech!hubcap!ncrcae!ncr-sd!crash!telesoft!roger
From: roger@telesoft.UUCP (Roger Arnold @prodigal)
Newsgroups: comp.arch
Subject: conditional branches
Message-ID: <191@telesoft.UUCP>
Date: 11 Feb 88 01:06:57 GMT
Organization: TeleSoft Inc., San Diego, CA
Lines: 49
Keywords: frequency statement, flag registers, invariant expressions


There's been a lot of discussion lately on the utility of FREQUENCY
statements and the subject of branch prediction, etc.  But nobody has 
been addressing an issue that looks (to me) to offer more potential
for improving performance.

A significant fraction of real code always seems to be devoted to
repeating tests whose outcome is "already known".  I'm thinking of
things like tests on input parameters, when the tests occur multiple
times within the same subprogram activation.  There may be a standard
technical term for what I'm talking about, but I'm not aware of it.
"Loop invariant conditional expressions" are a major subcategory, but
the category I want includes any test whose outcome can be determined
by another test or tests preceding it in the data flow--recognizing
that "another test" could include a prior execution of the same test.

It's easy for an optimizer that does global data flow analysis to
recognize such tests, but no architecture that I know of gives it  
anything very interesting to do with them.  Sure, for a complex loop 
invariant test, the optimizer can move the test into the loop pre-
header, and substitute a simple test on a boolean varible inside the 
loop.  Big deal.  The test on the variable and the conditional branch 
still end up sitting in the loop, taking up code space and execution 
cycles.  Or the optimizer can get wild, and set up one copy of the 
loop as the THEN part of an IF THEN ELSE, and another as the ELSE 
part.  That way the invariant tests, and any code that becomes dead 
as a result of the test hoisting, are eliminated from the loops.
Execution efficiency is optimized, but the high cost in code space 
makes it practical only for small loops.

What I'd like to see in an architecture, as a minimum, would be a set
of one bit registers into which boolean values can be written.  It's
silly to have to tie up an entire 32-bit register to hold a boolean
value.  Especially since the register will still have to be tested to
make its value accessible to the branch unit.  With addressable flag
registers, conditional branch instructions could directly reference 
the flag of interest, with no preceding test instruction.  The flag 
registers could be available to an early stage of the instruction 
decode pipe, so that the branch instructions referencing them would 
be promptly converted to no-ops or unconditional branches, as
appropriate.  Five of the flag registers could be reserved for the
standard ALU flags (zero, negative, positive, carry, and overflow), 
so that only one general form of conditional branch would be needed.

Now, who's going to tell me that I've just described their favorite
machine from out of the past?  Surely SOMEONE has used this idea
before?

- Roger Arnold				..sdcsvax!telesoft!roger