Path: utzoo!attcan!uunet!husc6!mailrus!ames!amdcad!tim From: tim@amdcad.AMD.COM (Tim Olson) Newsgroups: comp.arch Subject: Branch Delay Annullment Message-ID: <22065@amdcad.AMD.COM> Date: 14 Jun 88 17:31:01 GMT Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 49 In a recent posting discussing the Motorola 88000 architecture, the section on branching contained the following information: | Which forms of branch delay are present in instruction set | [execute N if no branch, execute N if branch, execute N always]? | execute 1 always and execute 1 if no branch | What are the taken and not-taken cycle counts for each branch type, | not including the N delayed instructions, if executed? | execute 1 always: 1 cycle, taken or not | execute 1 if no branch: 1 cycle untaken, 2 cycles taken The last statement says that branch delay annullment ("squashing") takes place if the branch is taken, causing the branch to take 2 cycles. This is the opposite of what the SPARC annulled branch does -- it squashes untaken branches. Squashing the untaken branches seems more effective to me. Take, for example, a simple loop: loop: load r0, addr add r0, r0, 1 store r0, addr add addr, addr, 4 add count, count, 1 cpge bool, count, MAX jmpf bool, loop nop With untaken branch-delay squashing, we can rewrite this as load r0, addr loop: add r0, r0, 1 store r0, addr add addr, addr, 4 add count, count, 1 cpge bool, count, MAX jmpf bool, loop /* squashed on fall-through */ load r0, addr and perform the load for the subsequent loop iteration in the delay slot of the jump. Since loops are usually executed many times, the annul-untaken form would seem to give the best overall performance. Any thoughts as to the benefits of annul-taken form? -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)