Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!mips!zalman From: zalman@mips.com (Zalman Stern) Newsgroups: comp.arch Subject: Re: Loop instructions Message-ID: <2848@spim.mips.COM> Date: 30 Apr 91 22:08:50 GMT References: <12739@pt.cs.cmu.edu> <41612@cup.portal.com> <63942@bbn.BBN.COM> Sender: news@mips.COM Organization: MIPS Computer Systems, Sunnyvale, California Lines: 40 Nntp-Posting-Host: dish.mips.com In article <63942@bbn.BBN.COM> pplacewa@bbn.com (Paul W Placeway) writes: [...] >After hacking too many DSP things, all I have to say about loop >unrolling is that it's a good technique to make up for a bad >architecture. > >What I want in my processor is a zero-overhead-per-loop down-counting >loop instruction. The TMS320 series, and Motorola DSP 56000 and 96000 [...] > -- Paul Placeway The IBM RISC System/6000 (RIOS) has such an instruction. In fact, you can also fold a condition register bit into the loop test and so long as the condition is computed far enough in advance (three cycles), the branch is free. A simple decrement and branch loop instruction is always free. Of course there are tradeoffs to consider here. Setting up the loop count register takes longer than putting the value into a general purpose register (GPR). (I'm not sure what the latency of moving a GPR to the count register is. I'd guess two or three cycles. Anyone out there know?) If there are function calls in the loop, you have to save and restore the count register. If you need to use the loop variable as an index, you need an add in the loop anyway. (On the RIOS you usually finese this by using a load and update instruction to step an induction variable.) All in all, this is a win for the IBM machine since they have already decoupled the integer ALU and instruction fetching. Its not clear how this would apply to other RISC architectures though. A first criterion would be for the instruction to take a GPR for the counter. It would also have a delay slot. I doubt such an instruction requires an extra write port into the register file. (The branch instruction doesn't write anything so its writeback slot can be used to update the counter register.) It might require some extra ALU hardware since the count register gets decremented and an offset is added to the PC in one cycle. Its also a lot less important in certain superscalar implementations where the counter update can overlap something else in the loop. -- Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088 zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395 "Never rub another man's rhubarb" -- Pop Will Eat Itself