Path: utzoo!utgpu!water!watmath!clyde!rutgers!mit-eddie!uw-beaver!cornell!rochester!ur-tut!sunybcs!boulder!hao!ames!ucbcad!ucbvax!hplabs!sdcrdcf!ism780c!nobody From: nobody@ism780c.UUCP (Unprivileged user) Newsgroups: comp.arch Subject: Re: taken -vs- untaken branches, Fortran FREQUENCY declaration Message-ID: <8513@ism780c.UUCP> Date: 9 Jan 88 02:20:30 GMT References: <496@cresswell.quintus.UUCP> <638@l.cc.purdue.edu> <836@ima.ISC.COM> <645@l.cc.purdue.edu> Reply-To: marv@ism780.UUCP (Marvin Rubenstein) Organization: Interactive Systems Corp., Santa Monica CA Lines: 68 >> The FREQUENCY statement disappeared for two reasons, as far as I can tell. >> The first is that it didn't improve the code much; changing the order of >> the "branch if greater" vs. the "branch if less" instructions after a test >> made little difference on the non-overlapped, non-pipelined 7094. ..... First a note on the history of FORTRAN and how it interacted with the IBM 704 architecture. FORTRAN was originally developed as programming language for the IBM 704 computer. The FREQENCY statement was associated with the arithmetic IF statement. An IF statement like: IF () 1,2,3 Was compiled into the following 704 machine code. TZE LABEL2 /* transfer to label 2 if result is zero */ TMI LABEL1 /* transfer to lable 1 if result is minus */ TRA LABEL3 /* unconditional transfer to label 3 */ Now, the purpose of the FREQUENCY statement was was to allow the programmer to assert that the most frequent value of the was negative. The hope was that the generated branching code would be: TMI LABEL1 /* transfer to lable 1 if result is minus */ TZE LABEL2 /* transfer to label 2 if result is zero */ TRA LABEL2 /* unconditional transfer to label 3 */ thus causing the processor to execute fewer branch instructions on average. But allas, the semantics of FORTRAN forbad the above optimization! The reason is that 'minus' did mean the same as 'less than zero' on the 704 hardware. The 704 used signed magnitude representation for numbers. So there were *two* forms of zero, +0 and -0. and the TMI instruction would branch if the result was -0. Therefore, the compiler was forced to generate the zero test first independent of any FREQUENCY assertion. It was for this reason that FREQUENCY was removed from FORTRAN. The FORTRAN compiler did generate high quality code. For example, the inner loop of a matrix multiply could be written: C(I,J) = C(I,J) + A(I,K)*B(K,J) the generated code look like: loop LDQ a,ik /* ij is an index register 2-cycles */ FMP b,kj, /* kj is an index register 4-cycles (ave) */ FAD c,ij /* ij is an index register 6-cycles (ave) */ STO c,ij /* 2-cycles */ TXI *+1,kj,n /* increment kj by row size 1-cycles */ TXI *+1,ik,1 /* increment ik by one 1-cycle */ TXL loop,ik,n+1 /* to loop if not done 1-cycle */ >It is a property of the 70x(x) series, and most of the computers of that time, >that a branch was almost costless, and the time required to save and restore >all registers was approximately that of a single multiplication. >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 >Phone: (317)494-6054 >hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet Second, some history on computer architecture. On the 709 and 7094 the instructions times were: branching -- 1 cycle load, add, etc -- 2 cycles multiply -- 2 to 5 cycles save & restore -- 52 cycles (all registers) subroutine call -- 7 cycles average (FORTRAN calling convention) 709 cycle -- 12 micro seconds 7094 cycle -- 1.2 microseconds (average) Marvin Rubinstein (Historian) Interactive Systems.