Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ncar!noao!nud!tom From: tom@nud.UUCP (Tom Armistead) Newsgroups: comp.arch Subject: Re: Branch Delay Annullment Message-ID: <1081@nud.UUCP> Date: 15 Jun 88 16:59:16 GMT References: <22065@amdcad.AMD.COM> Reply-To: tom@nud.UUCP (Tom Armistead) Organization: Motorola Microcomputer Division, Tempe, Az. Lines: 55 In article <22065@amdcad.AMD.COM> tim@amdcad.AMD.COM (Tim Olson) writes: [Concerning 88k branch delay slot handling ] >This is the opposite of what the SPARC annulled branch does -- it >squashes untaken branches. Squashing the untaken branches seems more >effective to me. Take, for example, a simple loop: > load r0, addr >loop: > add r0, r0, 1 > store r0, addr > add addr, addr, 4 > add count, count, 1 > cpge bool, count, MAX > jmpf bool, loop /* squashed on fall-through */ > load r0, addr >of the jump. Since loops are usually executed many times, the >annul-untaken form would seem to give the best overall performance. >Any thoughts as to the benefits of annul-taken form? The same loop can be written in 88K asm as: (I'm not familiar with SPARC code so I hope this is equivalent - it illustrates the point anyway). (addr, count, bool are registers I presume.) loop: ld r2,addr,0 add r2,r2,1 st r2,addr,0 add count,count,1 cmp bool,count,MAX bb0.n eq,bool,loop ; This branch effectively takes add addr,addr,4 ; only 1 tick. The number of loop instructions is equivalent in either the annul-taken form or the always executed form (for this example anyway). The only slight difference is that no "cleanup" instruction was required in the always executed form. There might be some instances where annul-taken is better but I don't know of any specific ones. As an aside, the 88k has some addressing modes which will allow the above code to be written more efficiently as: add count,r0,MAX-1 loop: ld r2,addr[count] add r2,r2,1 st r2,addr[count] bcnd.n ne0,count,loop ; This branch effectively takes sub count,count,1 ; only 1 tick. -- Just a few more bits in the stream. The Sneek