Path: utzoo!attcan!uunet!husc6!bbn!uwmcsd1!ig!agate!ucbvax!decwrl!pyramid!prls!mips!earl From: earl@mips.COM (Earl Killian) Newsgroups: comp.arch Subject: architecture/implementation -- 88000 Message-ID: <2232@gumby.mips.COM> Date: 24 May 88 05:22:55 GMT Lines: 209 (See previous postings for background.) (Thanks to Andrew Klossner for his help on this one.) > Architecture Reference Where is the architecture fully described? -- Technical Summary: 32-Bit Concurrent RISC Microprocessor (27-page data sheet) -- MC78000 User's Manual Revision 0.4, October 7, 1987, Advanced Information (100+ page document, describes registers, instructions, exception processing, and timing information in detail; it has no doubt been renamed by now) -- Technical Summary: 32-Bit Cache/Memory Management Unit (CMMU) (19-page data sheet) -- MC78200 Cache and Memory Management Unit (CMMU) Architecture Spec. version 2.0, November 3, 1986, Advanced Information (80-page document, describes pre-production CMMU in detail) -- MC78200 User's Manual Revision 0.1, November 29, 1987, Advanced Information (80+ page document, like above but includes architecture changes which will appear in the production chip) > Peak native MIPS What is the clock cycle time? 20MHz (50ns) What is the peak native MIPS rate? 20mips > Implementation technology What are the parameters of the implementation technology? 1.5micron CMOS How many chips of what kinds to build a cpu subsystem? 1 88100 2-8 88200s How many pins on those chips? Each chip is in a 17 pin by 17 pin package, 181 pins apiece. > Instruction format What instruction sizes are used? 32 bits What size are immediate operands? 16 bits What size are branch displacements? 16 bits (+-128KB) What size are unconditional branch and call displacements? 26 bits (+-128MB) > Integer Registers How are the registers organized [simple, windowed]? simple How many total integer registers? 32 32-bit registers Hardwired zero register? yes, r0 4 registers reserved for linker > Integer Alu What is the logical latency/issue/repeat? 1/1/1 What is the shift latency/issue/repeat? 1/1/1 What is the add latency/issue/repeat? 1/1/1 What is the compare latency/issue/repeat? 1/1/1 How is 64 bit (signed/unsigned) integer addition supported and how many cycles? An "addu.co" instruction followed by an "add.ci" or "addu.ci" instruction. Each is 1/1/1 for a total of 2/2/2. > Branches Which operand comparisons are implemented in the conditional branch instruction, and which require a separate instruction? branch instructions: = 0, != 0, > 0, < 0, >= 0, <= 0 bit set, bit clear Everything else requires a separate compare instruction. Where is the result of separate comparisons stored [registers, condition codes]? registers Which forms of branch delay are present in instruction set [execute N if no branch, execute N if branch, execute N always]? execute 1 always and execute 1 if no branch What are the taken and not-taken cycle counts for each branch type, not including the N delayed instructions, if executed? execute 1 always: 1 cycle, taken or not execute 1 if no branch: 1 cycle untaken, 2 cycles taken > Loads/Stores What addressing mode(s) do load instructions use? register + 16-bit unsigned displacement register + register register + register*size What addressing mode(s) do store instructions use? same Which load/store sizes are supported [8, 16, 32, 64]? 8, 16, 32, 64 What is the load latency/issue/repeat? 3/1/1 for 8-32, 4/2/2 for 64 What is the store latency/issue/repeat? 1/1/1 for 8-32, 2/2/2 for 64 > Integer Multiply/Divide How is multiply is implemented [software, multiply step, hardware]? hardware How many cycles to perform 32x32->32 multiply? 4/1/1 How is divide is implemented [software, divide step, hardware]? hardware How many cycles to perform 32x32->32 divide? 39/1/39 Signed divide traps on negative operand. How is 32x32->64 bit integer multiplication supported and how many cycles? Software. No cycle count estimate. How is 64/32->32,32 bit integer division supported and how many cycles? Software. No cycle count estimate. > Floating Point Are floating point registers separate from integer registers? no How many 32-bit floating point registers? 32 How many 64-bit floating point registers? 16 How many 80-bit floating point registers? 0 How is floating point is implemented [software, coprocessor, on-chip]? on-chip What are the floating point operation latency/issue/repeats? 32-bit 64-bit 80-bit add 5/ 1/ 1 6/ 2/ 2 n.a. mul 5/ 1/ 1 10/ 2/ 2 n.a. div 30/ 1/30 60/ 2/60 n.a. sqrt n.a. n.a. n.a. Which floating point units can operate in parallel? add and multiply Can floating point operate in parallel with integer? yes Are floating point exceptions precise? some but not all > Memory management Page size in bytes? 4096 How many bits in a virtual address? 32 What is the size of the user-mode address space? 4G There can be two user-mode address spaces, each 4G, if you want to split I&D. How many bits in a physical address? 32 How many bits of address space id are added to virtual addresses, if any? 0 Translation cache [none, off-chip, in-cache, on-chip]? in-cache Translation cache size in entries? 56 Translation cache associativity [direct-mapped, 2-set, 4-set, full]? full Translation cache miss handled by [software, hardware]? hardware Also 10 512Kbyte software-managed translation entries. > Caches Instruction cache [none, off-chip, on-chip]? off-chip Data cache [none, off-chip, on-chip]? off-chip Are I and D caches separate? yes I-cache total size in bytes? 16K to 64K I-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set I-cache address block size in bytes (bytes per tag)? 16 I-cache transfer block size in bytes (bytes read on cache miss)? 16 I-cache index [virtual, physical]? virtual The distinction only matters when there is more than one CMMU on a memory port. When there's just one, the index is both virtual and physical. I-cache tag [virtual, physical]? physical D-cache total size in bytes? 16K to 64K D-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set D-cache writes [write-through, write-back]? write-through or write-back D-cache address block size in bytes (bytes per tag)? 16 D-cache transfer block size in bytes (bytes read on cache miss)? 16 D-cache index [virtual, physical]? virtual See comment for I-cache index. D-cache tag [virtual, physical]? physical Is there a secondary cache? no > Branch Prediction What form of branch prediction is used, if any? none > Other Describe other unique or interesting features of the architecture or its implementation. E.g. describe the functional units, with emphasis on non-standard units. There are four 32-bit scratch "control" registers available in supervisor mode. There's a user-writable "floating point control register" with bits like "disable divide-by-zero exception", "disable overflow exception", and so on. The bits are not interpreted by the hardware; the exception always occurs, and it's up to the kernel to fix up the imprecise result and make it appear to the user as though the exception hadn't occurred. The kernel does all the right IEEE things, including implementing not-a-number. There's an instruction to trap on subscript out of range. A bit in the PSR selects whether the data space is big-endian or little-endian. The instruction and data pipelines are exposed to software. Exception handling involves a lot of overhead; the code has to deal with up to six outstanding user page faults and up to nine outstanding floating point exceptions. You can't just duck in and out of a device interrupt routine and then return with RTE. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086