Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!amdcad!mozart.amd.com!nucleus!davec From: davec@nucleus.amd.com (Dave Christie) Newsgroups: comp.arch Subject: Re: taxonomy for superscalars/etc (LONG) Message-ID: <1990Jul23.182546.25777@mozart.amd.com> Date: 23 Jul 90 18:25:46 GMT References: <9782@hubcap.clemson.edu> Sender: usenet@mozart.amd.com (Usenet News) Reply-To: davec@nucleus.amd.com (Dave Christie) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 115 In article <9782@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes: >I would like to suggest a possible taxonomy to distinguish among >the differing organizations of new processors. Your comments >and corrections are welcome. > >I see three major areas: issue parallelism, start of execution >(which can differ from time of issue if it is the responsibility >of the functional unit to obtain its own operands), and resource >naming (specifically registers). > >Thus I would like to give a three-part classification, i/e/n, >to each machine. The first field would be issue parallelism: > > (1) single issue per cycle > (s) superscalar issue of independent instructions > (v) vliw issue in which several opcodes or instructions (i.e. i860) > are grouped into a wide instruction word How about adding (d) = decoupled superscalar issue of independent instructions, where instructions are issued simultaneously from two independent streams (Smith, James E., _Decoupled_Access/Execute_ Computer_Architectures_, ACM TOCS, Nov 1984, plus a half-dozen other papers by him). This is considerably different from the type of superscalar issue we are beginning to see in advanced RISC chips, and has been used commercially (Astronautics Corp's short-lived ZS-1(?) for one). BTW, I'm not sure what your taxonomy is intended to cover - features that are architecturally visible, or implementation techniques regardless of whether they influence the architecture/compiler? I assume the latter, since renaming is used to hide stuff from the compiler. >The second field would be execution start time: > > (d) data dependencies (RAW) are interlocked by issue unit, which > stalls until resolution; the fn unit starts upon issue since > issue unit provides both the op specification as well as the > operands > (c) compiler must reorder instructions to avoid data dependencies > (x) out-of-order execution, where the fn unit starts only after > obtaining its operands -- the issue unit does not stall on > data dependencies but forces the fn unit to resolve them These catagories aren't mutually exclusive - the R3000 which you classify as 1/c/p is "d" for cache misses on a load, as well as accesses to the HI and LO registers during multiply/divide. Should it be classified as "dc", or where does one draw the line? I also have a semantic bone of contention for these first two parts. When instruction execution becomes separated from instruction "issue", most literature I've seen refers to the act of sending an instruction to a functional unit as "dispatch", with the term "issue" used to refer to the act of placing an instruction in execution. I don't think "execution" can be used in place of "issue" here, since execution can take multiple cycles, and issue (& dispatch) indicate single cycle events (to me, anyway). This may seem pretty picky, but rigorous definition of these terms definitely helps avoid confusion when working with this stuff. >The third field would be resource naming, specifically register >renaming: > > (p) physical registers named in instructions -- must be concerned > about anti-dependencies (WAR) and output dependencies (WAW) > (r) hardware renames logical registers in instructions by tagging > or assignment of physical registers (removes WAR and WAW) A few comments: How about a designation for cases where possible dependencies are eliminated mainly by using independent register sets (ref. i860 integer/fp registers, decoupled architectures)? Dependencies between functional units can be handled via queues between functional units - this is typical for decoupled architectures (ref. aforementioned paper). Moreover, this technique can be used in single-stream architectures with independent functional units, and the queues can be architecturally visible (accessed via register designators) or implemented using renaming (architecturally hidden), where the renaming is only done for queued operands, not all operations. I have seen implementations where the renaming is permanent, such that the architectural state of a process is maintained by a set of pointers into a pool of registers, and where it is only temporary (to cover the average or maximum execution latency), with a set of physically-addressed registers being updated in order from a reorder buffer or similar mechanism. (This is probably getting too detailed for your purposes.) >Using this proposed taxonomy, I would classify the following machines: > > Tandem Cyclone s / d / p ?? Maybe in a class by itself, considering how it relies on the compiler to pair up instructions, which are then converted to a vliw-like micrand via the control store. > IBM S/360 M91 (Tomasulo) 1 / x / r > IBM RS/6000 s / x / r These two use renaming in considerably different ways: in the 360/91 the tags refer to specific physical registers, and not just the floating point registers. In the RS6000, at least from what I can tell from the literature I have, the renaming is used primarily to implement a load operand queue, as described above. I don't know whether or not renaming is used to handle intra-FPU conflicts. I'm pretty sure renaming is not used within the fixed-point unit. (The RS6000 actually has several attributes of the single-stream decoupled access/execute architecture described in Jim Smith's aforementioned paper.) In any case, you should probably indicate what level of detail you wish to represent with this taxonomy. One can get really carried away with this stuff [...he says nonchalantly, concluding the longest posting of his life...:-)]. ---------------------------------- Dave Christie My opinions only.