Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!pasteur!ucbvax!decwrl!pyramid!prls!mips!earl From: earl@mips.COM (Earl Killian) Newsgroups: comp.arch Subject: Proposed architecture characterization survey form Message-ID: <2048@gumby.mips.COM> Date: 19 Apr 88 07:49:02 GMT Lines: 154 Now that Motorola has announced the 88000, I believe all the commercial "RISC"s are out in the open (or am I missing something?). This list includes the MIPS R2000/R3000, the Fairchild Clipper, the IBM RT, the HP Precision, the AMD 29000, Sun SPARC, Intel 80960, and Motorola 88000 (speak up if I left anyone out!). I propose that comp.arch develop a standard form for describing "RISC" architectures and apply it to the above. (We could include military and research machines as well, if people so desire.) Below I propose such a form, which will, no doubt, require generalization. Once we agree on what it takes to fairly well characterize an architecture and its implementation, we can fill in the answers for all of the above (unless people think this is a worthless exercise?). First some definitions of my terminology are in order, because it's probably different from everyone else's. The latency of an operation is the time it takes for the entire operation to complete. The issue time is the time before you can start the next instruction, and the rate is the time until you can start another instruction of the same type. For example, a machine might require 3 cycles for a load instruction: 1 to calculate the address, 2 to access the cache, and allow a new load every 2 cycles, but allow a non-load to start immediately. I describe this load as 3/1/2. What is commonly called the load delay (as opposed to latency) is the time after the load before you can reference the result. This is the latency minus the issue time (3 - 1 = 2) in this case. Don't confuse latency with delay. Some latency/issue/rate examples from the Cray-1S (from memory, so don't quote me): logicals: 1/1/1 shift: 2/1/1 integer add: 3/1/1 load: 11/2/2 An example of a multi-cycle latency, non-pipelined floating point unit might have: add: 2/1/2 mul: 4/1/4 I hope that is clear enough. If not, I'll try to clarify. Here is my proposed form to characterize architectures and their implementations. I'll post the MIPSco numbers once we agree on the data to collect. > Peak native MIPS What is the clock cycle time? What is the peak native MIPS rate? > Implementation technology What are the parameters of the implementation technology? > Instruction format What instruction sizes are used? What size are immediate operands? What size are branch displacements? > Integer Registers How are the registers organized [simple, windowed]? How many total integer registers? Hardwired zero register? For windowed machines: How many registers are addressed by an instruction? How many of these are not windowed? What window increments are supported? Window overflow and underflow are handled in [software, hardware]? > Integer Alu What is the logical latency/issue/rate? What is the shift latency/issue/rate? What is the add latency/issue/rate? What is the compare latency/issue/rate? > Branches Which operand comparisons are implemented in the conditional branch instruction, and which require a separate instruction? Where is the result of separate comparisons stored [registers, condition codes]? Which forms of branch delay are present in instruction set [execute N if no branch, execute N if branch, execute N always]? What are the taken and not-taken cycle counts for each branch type? > Loads/Stores What addressing mode(s) do load instructions use? What addressing mode(s) do store instructions use? Which load/store sizes are supported [8, 16, 32, 64]? What is the load latency/issue/rate? What is the store latency/issue/rate? > Integer Multiply/Divide How is multiply is implemented [software, multiply step, hardware]? How many cycles to perform 32x32->32 multiply? How is divide is implemented [software, divide step, hardware]? How many cycles to perform 32x32->32 divide? > Floating Point Are floating point registers separate from integer registers? How many 32-bit floating point registers? How many 64-bit floating point registers? How many 80-bit floating point registers? How is floating point is implemented [software, coprocessor, on-chip]? What are the floating point operation latency/issue/rates? 32-bit 64-bit 80-bit add mul div Which floating point units can operate in parallel? Can floating point operate in parallel with integer? Are floating point exceptions precise? > Memory management Page size? Translation cache [none, off-chip, on-chip]? Translation cache size in entries? Translation cache associativity [direct-mapped, 2-set, 4-set, full]? Translation cache miss handled by [software, hardware]? > Caches Instruction cache [none, off-chip, on-chip]? Data cache [none, off-chip, on-chip]? Are I and D caches separate? I-cache total size in bytes? I-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? I-cache address block size in bytes (bytes per tag)? I-cache transfer block size in bytes (bytes read on cache miss)? I-cache index [virtual, physical]? I-cache tag [virtual, physical]? D-cache total size in bytes? D-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? D-cache writes [write-through, write-back]? D-cache address block size in bytes (bytes per tag)? D-cache transfer block size in bytes (bytes read on cache miss)? D-cache index [virtual, physical]? D-cache tag [virtual, physical]? -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086