Path: utzoo!utgpu!water!watmath!clyde!rutgers!mit-eddie!uw-beaver!cornell!rochester!ur-tut!sunybcs!boulder!hao!ames!ucbcad!ucbvax!hplabs!sdcrdcf!ism780c!nobody
From: nobody@ism780c.UUCP (Unprivileged user)
Newsgroups: comp.arch
Subject: Re: taken -vs- untaken branches, Fortran FREQUENCY declaration
Message-ID: <8513@ism780c.UUCP>
Date: 9 Jan 88 02:20:30 GMT
References: <496@cresswell.quintus.UUCP> <638@l.cc.purdue.edu> <836@ima.ISC.COM> <645@l.cc.purdue.edu>
Reply-To: marv@ism780.UUCP (Marvin Rubenstein)
Organization: Interactive Systems Corp., Santa Monica CA
Lines: 68

>> The FREQUENCY statement disappeared for two reasons, as far as I can tell.
>> The first is that it didn't improve the code much; changing the order of
>> the "branch if greater" vs. the "branch if less" instructions after a test
>> made little difference on the non-overlapped, non-pipelined 7094.  .....

First a  note on the history of FORTRAN and how it interacted with the IBM
704 architecture.  FORTRAN was originally developed as programming language
for the IBM 704 computer.  The FREQENCY statement was associated with the
arithmetic IF statement. An IF statement like:

       IF (<EXPRESSION>) 1,2,3

Was compiled into the following 704 machine code.

	<evaluate-expression>
	TZE  LABEL2      /* transfer to label 2 if result is zero */
	TMI  LABEL1      /* transfer to lable 1 if result is minus */
	TRA  LABEL3      /* unconditional transfer to label 3 */

Now, the purpose of the FREQUENCY statement was was to allow the programmer
to assert that the most frequent value of the <expression> was negative.
The hope was that the generated branching code would be:

	TMI  LABEL1      /* transfer to lable 1 if result is minus */
	TZE  LABEL2      /* transfer to label 2 if result is zero */
	TRA  LABEL2      /* unconditional transfer to label 3 */

thus causing the processor to execute fewer branch instructions on average.
But allas, the semantics of FORTRAN forbad the above optimization!  The
reason is that 'minus' did mean the same as 'less than zero' on the 704
hardware.  The 704 used signed magnitude representation for numbers.  So
there were *two* forms of zero, +0 and -0.  and the TMI instruction
would branch if the result was -0.  Therefore, the compiler was forced to
generate the zero test first independent of any FREQUENCY assertion.  It was
for this reason that FREQUENCY was removed from FORTRAN.

The FORTRAN compiler did generate high quality code.  For example, the
inner loop of a matrix multiply could be written:
	 C(I,J) = C(I,J) + A(I,K)*B(K,J)
the generated code look like:

  loop     LDQ  a,ik         /* ij is an index register  2-cycles       */
	   FMP  b,kj,        /* kj is an index register  4-cycles (ave) */
	   FAD  c,ij         /* ij is an index register  6-cycles (ave) */
	   STO  c,ij         /*                          2-cycles       */
	   TXI  *+1,kj,n     /* increment kj by row size 1-cycles       */
	   TXI  *+1,ik,1     /* increment ik by one      1-cycle        */
	   TXL  loop,ik,n+1  /* to loop if not done      1-cycle        */

>It is a property of the 70x(x) series, and most of the computers of that time,
>that a branch was almost costless, and the time required to save and restore
>all registers was approximately that of a single multiplication.
>-- 
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
>Phone: (317)494-6054
>hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet

Second, some history on computer architecture.  On the 709 and
7094 the instructions times were:
       branching       -- 1 cycle
       load, add, etc  -- 2 cycles
       multiply        -- 2 to 5 cycles
       save & restore  -- 52 cycles (all registers)
       subroutine call -- 7  cycles average (FORTRAN calling convention)
       709 cycle       -- 12 micro seconds
       7094 cycle      -- 1.2 microseconds (average)

Marvin Rubinstein (Historian)  Interactive Systems.