Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!sdd.hp.com!think.com!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!mucs!mshute From: mshute@cs.man.ac.uk (Malcolm Shute) Newsgroups: comp.arch Subject: Re: Loop instructions Message-ID: <2518@m1.cs.man.ac.uk> Date: 10 May 91 10:02:38 GMT References: <12739@pt.cs.cmu.edu> <41612@cup.portal.com> <63942@bbn.BBN.COM> Sender: news@cs.man.ac.uk Reply-To: mshute@cs.man.ac.uk (Malcolm Shute) Organization: Department of Computer Science, University of Manchester UK Lines: 45 In article <63942@bbn.BBN.COM> pplacewa@bbn.com (Paul W Placeway) writes: >What I want in my processor is a zero-overhead-per-loop down-counting >loop instruction. The TMS320 series, and Motorola DSP 56000 and 96000 >have had this sort of thing for quite a while. For those of you who >happen to be unfamilar, the idea is that the PC addressing hardware >has a loop beginning, end, and count register and the hardware does a >decrement-branch-nonzero when the PC == end-of-loop, resetting it to >beginning-of-loop, while in the instruction fetch stage. One interesting[*] idea which I saw once [[*]where "interesting" == "novel", "neat" and "mind-stimulating", but probably != "efficient", "commercially cost effective"] made the instruction set completely block structured: the fetch part of the instruction cycle worked with one stack (with the IR and PC as the top-of-stack, and next-on-stack) respectively; and the execute part of the instruction cycle worked with the other (i.e. arithmetic) stack (again with the top two items in registers... effectively the ACC and ARG registers of the machine). Every instruction had a repetition field which counted down to zero, while the op-code part of the instruction was repeatedly obeyed this number of times. Most arithmetic instructions would just carry a repetition count of one, and do their normal work on the arithmetic stack. Occasionally a PUSH instruction with a higher repetition count would be used to move a contiguous block of data around (block move). Subroutine calls simply involved pushing the new PC value on to the system stack, once the calling instruction (presumably with a repetition count of 1) had been POPed from the top (i.e had finished executing in the IR). Loops were then implemented using this subroutine-call mechanism, with the appropriate iteration count set in the calling instruction (some sort of bit twiddling of fields would be required if anything other than a compile-time constant number of iterations was required). Its designer (Prof Osmon of City University, in London) named it CAC (clean architecture computer -- due to its minimal and regular hardware) just before the RISC label became fashionable. Has any one any thoughts on how this could be put into practical use? Or, indeed, has any one ever seen a real example embodying any of these principles? -- Malcolm SHUTE. (The AM Mollusc: v_@_ ) Disclaimer: all