Path: utzoo!mnetor!uunet!husc6!bloom-beacon!mit-eddie!uw-beaver!tektronix!orca!tekecs!frip!andrew From: andrew@frip.gwd.tek.com (Andrew Klossner) Newsgroups: comp.arch Subject: hard data on Motorola 88000 Message-ID: <9916@tekecs.TEK.COM> Date: 18 Apr 88 19:44:54 GMT Sender: nobody@tekecs.TEK.COM Lines: 82 The announcement is today, so I guess it's okay to talk hard data on the Motorola 88000 architecture. The 88100, the CPU chip, includes a floating point processor. The 88200 is the CMMU (cache/memory management unit). The CPU uses a Harvard architecture (separate memory ports for instruction and data) so a minimum configuration is one CPU and 2 CMMUs. It cycles at 20MHz initially, with 25MHz expected before long. The CPU itself, excluding the floating point unit, looks much like everybody else's RISC CPU. There are 32 registers, with r0 hardwired to zero. (No register windows.) There is hardware stalling on a register scoreboard. ALU instructions take three register addresses, two operands and a destination. They all execute in one cycle, except for integer multiply/divide. (There is result forwarding, so a destination register can be used in the next instruction without stalling.) Load/store instructions can take a 16 bit offset and an index register, which can be scaled by a factor of 1, 2, 4, or 8. To get to an arbitrary 32-bit address, you need two instructions: or.u r2,r0,hi16(address) ; high 16 bits of address to r2 ld r2,r2,lo16(address) ; load word into r2 There is a three-deep pipeline for instruction fetch and a three-deep pipeline for data fetch/store. Branch instructions have one delay slot, and each branch instruction has a bit which means execute the instruction in the delay slot before branching. Load instructions take three cycles if the target memory location is already in cache. Store instructions get started in one cycle if the data pipeline isn't full, otherwise they stall. The on-chip floating point unit implements floating point add/subtract/multiply/divide/compare and integer multiply/divide. Floating point instructions can freely mix single and double precision, which are the usual IEEE format 32- and 64-bit words. The add/subtract portion is separate from the multiply portion and both are pipelined, so, for example, there can be three multiplies going at one time. But the divide instruction takes over the whole FP unit and iterates through it. Integer multiply takes 4 cycles; integer divide takes 39. Single precision add/sub/cmp/mul/convert takes 5 cycles; single divide takes 30; double add/sub/cmp/convert takes 6; double mul takes 10; double divide takes 60. Curiously, an integer divide with a negative operand traps and makes the kernel complete the operation; I guess Motorola just ran out of silicon. Each CMMU has 16k bytes of RAM, organized as a 4-way set associative cache. You can have as many as 4 CMMUs on each memory port. The cache is by physical addresses, and the cache lookup, hashed on offset within page, proceeds in parallel with the logical to physical address translation to get the speed up. The MMU is a subset of Motorola's PMMU chip, with the usual two-level page tables and all the necessary bits (referenced, dirty, etc) in the page descriptor words. The CMMU includes a page address translation cache which can describe 56 entries, and a block address translation cache which can be used to avoid page table walks for memory that's locked down, like kernel code and data. A cache line is 16 bytes. On a cache miss during fetch, the whole line must be loaded from memory before the fetch is satisfied. On a cache miss during store, the whole line is loaded, then the modified word is written to memory; a cache hit during store does not cause the word to be written. The CMMUs include logic to do bus snooping and maintain cache coherency, so you can throw several CPU/CMMU lashups onto the same memory bus. Motorola is playing this up in their advertising, claiming 17 MIPS for one CPU and 50 MIPS for a multi-CPU system. Unix system V release 3 is up and running (single-CPU). A reference port will be sold by either Motorola or Unisoft. A binary compatibility standard, which eventually will be blessed by AT&T and be an ABI, is coming along. We at Tektronix have been designing a workstation around this chip set for several months. I like it. Don't ask me what price or availability are, I don't know the answers for the general public. As a member of the 88open consortium, Tektronix negotiated favorable terms. -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew%tekecs.tek.com@relay.cs.net) [ARPA]