Path: utzoo!attcan!uunet!husc6!bbn!uwmcsd1!ig!agate!ucbvax!decwrl!pyramid!prls!mips!earl
From: earl@mips.COM (Earl Killian)
Newsgroups: comp.arch
Subject: architecture/implementation -- 88000
Message-ID: <2232@gumby.mips.COM>
Date: 24 May 88 05:22:55 GMT
Lines: 209

(See previous postings for background.)

(Thanks to Andrew Klossner for his help on this one.)

> Architecture Reference

Where is the architecture fully described?

  -- Technical Summary: 32-Bit Concurrent RISC Microprocessor (27-page
     data sheet)

  -- MC78000 User's Manual Revision 0.4, October 7, 1987, Advanced
     Information (100+ page document, describes registers,
     instructions, exception processing, and timing information in
     detail; it has no doubt been renamed by now)

  -- Technical Summary: 32-Bit Cache/Memory Management Unit (CMMU)
     (19-page data sheet)

  -- MC78200 Cache and Memory Management Unit (CMMU) Architecture Spec.
     version 2.0, November 3, 1986, Advanced Information (80-page
     document, describes pre-production CMMU in detail)

  -- MC78200 User's Manual Revision 0.1, November 29, 1987, Advanced
     Information (80+ page document, like above but includes
     architecture changes which will appear in the production chip)

> Peak native MIPS

What is the clock cycle time? 20MHz (50ns)
What is the peak native MIPS rate? 20mips

> Implementation technology

What are the parameters of the implementation technology? 1.5micron CMOS
How many chips of what kinds to build a cpu subsystem?
	1 88100
	2-8 88200s
How many pins on those chips?
	Each chip is in a 17 pin by 17 pin package, 181 pins apiece.

> Instruction format

What instruction sizes are used? 32 bits
What size are immediate operands? 16 bits
What size are branch displacements? 16 bits (+-128KB)
What size are unconditional branch and call displacements? 26 bits (+-128MB)

> Integer Registers

How are the registers organized [simple, windowed]? simple
How many total integer registers? 32 32-bit registers
Hardwired zero register? yes, r0

4 registers reserved for linker

> Integer Alu

What is the logical latency/issue/repeat? 1/1/1
What is the shift latency/issue/repeat? 1/1/1
What is the add latency/issue/repeat? 1/1/1
What is the compare latency/issue/repeat? 1/1/1
How is 64 bit (signed/unsigned) integer addition supported and how many cycles?
	An "addu.co" instruction followed by an "add.ci" or "addu.ci"
	instruction.  Each is 1/1/1 for a total of 2/2/2.

> Branches

Which operand comparisons are implemented in the conditional branch
instruction, and which require a separate instruction?
	branch instructions: = 0, != 0, > 0, < 0, >= 0, <= 0
			bit set, bit clear
	Everything else requires a separate compare instruction.

Where is the result of separate comparisons stored [registers,
condition codes]? registers

Which forms of branch delay are present in instruction set
[execute N if no branch, execute N if branch, execute N always]?
	execute 1 always and execute 1 if no branch
What are the taken and not-taken cycle counts for each branch type,
not including the N delayed instructions, if executed?
	execute 1 always: 1 cycle, taken or not
	execute 1 if no branch: 1 cycle untaken, 2 cycles taken
	
> Loads/Stores

What addressing mode(s) do load instructions use?
register + 16-bit unsigned displacement
	register + register
	register + register*size
What addressing mode(s) do store instructions use?
	same
Which load/store sizes are supported [8, 16, 32, 64]? 8, 16, 32, 64
What is the load latency/issue/repeat? 3/1/1 for 8-32, 4/2/2 for 64

What is the store latency/issue/repeat? 1/1/1 for 8-32, 2/2/2 for 64

> Integer Multiply/Divide

How is multiply is implemented [software, multiply step, hardware]? hardware
How many cycles to perform 32x32->32 multiply? 4/1/1

How is divide is implemented [software, divide step, hardware]? hardware
How many cycles to perform 32x32->32 divide? 39/1/39
	Signed divide traps on negative operand.

How is 32x32->64 bit integer multiplication supported and how many cycles?
	Software.  No cycle count estimate.

How is 64/32->32,32 bit integer division supported and how many cycles?
	Software.  No cycle count estimate.

> Floating Point

Are floating point registers separate from integer registers? no
How many 32-bit floating point registers? 32
How many 64-bit floating point registers? 16
How many 80-bit floating point registers? 0

How is floating point is implemented [software, coprocessor, on-chip]? on-chip
What are the floating point operation latency/issue/repeats?

		 32-bit		 64-bit		80-bit
	add	 5/ 1/ 1	 6/ 2/ 2	n.a.
	mul	 5/ 1/ 1	10/ 2/ 2	n.a.
	div	30/ 1/30	60/ 2/60	n.a.
	sqrt	n.a.		n.a.		n.a.

Which floating point units can operate in parallel? add and multiply
Can floating point operate in parallel with integer? yes
Are floating point exceptions precise? some but not all

> Memory management

Page size in bytes? 4096
How many bits in a virtual address? 32
What is the size of the user-mode address space? 4G
	There can be two user-mode address spaces, each 4G, if you
	want to split I&D.

How many bits in a physical address? 32
How many bits of address space id are added to virtual addresses, if any? 0
Translation cache [none, off-chip, in-cache, on-chip]? in-cache
Translation cache size in entries? 56
Translation cache associativity [direct-mapped, 2-set, 4-set, full]? full
Translation cache miss handled by [software, hardware]? hardware

Also 10 512Kbyte software-managed translation entries.

> Caches

Instruction cache [none, off-chip, on-chip]? off-chip
Data cache [none, off-chip, on-chip]? off-chip
Are I and D caches separate? yes
I-cache total size in bytes? 16K to 64K
I-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set
I-cache address block size in bytes (bytes per tag)? 16
I-cache transfer block size in bytes (bytes read on cache miss)? 16
I-cache index [virtual, physical]? virtual
	The distinction only matters when there is more than one CMMU on a
	memory port.  When there's just one, the index is both virtual and
	physical.
I-cache tag [virtual, physical]? physical
D-cache total size in bytes? 16K to 64K
D-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set
D-cache writes [write-through, write-back]? write-through or write-back
D-cache address block size in bytes (bytes per tag)? 16
D-cache transfer block size in bytes (bytes read on cache miss)? 16
D-cache index [virtual, physical]? virtual
	See comment for I-cache index.
D-cache tag [virtual, physical]? physical
Is there a secondary cache? no

> Branch Prediction

What form of branch prediction is used, if any? none

> Other

Describe other unique or interesting features of the architecture or
its implementation.
E.g. describe the functional units, with emphasis on non-standard
units.

There are four 32-bit scratch "control" registers available in
supervisor mode.

There's a user-writable "floating point control register" with bits
like "disable divide-by-zero exception", "disable overflow exception",
and so on.  The bits are not interpreted by the hardware; the exception
always occurs, and it's up to the kernel to fix up the imprecise result
and make it appear to the user as though the exception hadn't occurred.
The kernel does all the right IEEE things, including implementing
not-a-number.

There's an instruction to trap on subscript out of range.

A bit in the PSR selects whether the data space is big-endian or
little-endian.

The instruction and data pipelines are exposed to software.  Exception
handling involves a lot of overhead; the code has to deal with up to
six outstanding user page faults and up to nine outstanding floating
point exceptions.  You can't just duck in and out of a device interrupt
routine and then return with RTE.
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086