Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!sdd.hp.com!think.com!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!mucs!mshute
From: mshute@cs.man.ac.uk (Malcolm Shute)
Newsgroups: comp.arch
Subject: Re: Loop instructions
Message-ID: <2518@m1.cs.man.ac.uk>
Date: 10 May 91 10:02:38 GMT
References: <12739@pt.cs.cmu.edu> <41612@cup.portal.com> <63942@bbn.BBN.COM>
Sender: news@cs.man.ac.uk
Reply-To: mshute@cs.man.ac.uk (Malcolm Shute)
Organization: Department of Computer Science, University of Manchester UK
Lines: 45

In article <63942@bbn.BBN.COM> pplacewa@bbn.com (Paul W Placeway) writes:
>What I want in my processor is a zero-overhead-per-loop down-counting
>loop instruction.  The TMS320 series, and Motorola DSP 56000 and 96000
>have had this sort of thing for quite a while.  For those of you who
>happen to be unfamilar, the idea is that the PC addressing hardware
>has a loop beginning, end, and count register and the hardware does a
>decrement-branch-nonzero when the PC == end-of-loop, resetting it to
>beginning-of-loop, while in the instruction fetch stage.

One interesting[*] idea which I saw once
[[*]where "interesting" == "novel", "neat" and "mind-stimulating",
but probably != "efficient", "commercially cost effective"]
made the instruction set completely block structured:
the fetch part of the instruction cycle worked with one
stack (with the IR and PC as the top-of-stack, and next-on-stack)
respectively; and the execute part of the instruction cycle
worked with the other (i.e. arithmetic) stack (again with the
top two items in registers... effectively the ACC and ARG registers
of the machine).  Every instruction had a repetition field
which counted down to zero, while the op-code part of the instruction
was repeatedly obeyed this number of times.

Most arithmetic instructions would just carry a repetition count of one,
and do their normal work on the arithmetic stack.  Occasionally a PUSH
instruction with a higher repetition count would be used to move a
contiguous block of data around (block move).

Subroutine calls simply involved pushing the new PC value on to the
system stack, once the calling instruction (presumably with a repetition
count of 1) had been POPed from the top (i.e had finished executing in the IR).
Loops were then implemented using this subroutine-call mechanism, with
the appropriate iteration count set in the calling instruction (some sort
of bit twiddling of fields would be required if anything other than a
compile-time constant number of iterations was required).

Its designer (Prof Osmon of City University, in London) named it CAC
(clean architecture computer -- due to its minimal and regular hardware)
just before the RISC label became fashionable.

Has any one any thoughts on how this could be put into practical use?
Or, indeed, has any one ever seen a real example embodying any of these
principles?
--

Malcolm SHUTE.         (The AM Mollusc:   v_@_ )        Disclaimer: all