Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!xavax!jat
From: jat@xavax.com (John Tamplin)
Newsgroups: comp.sys.m88k
Subject: Re: Emulating other computers on 88K's and Benchmarks
Message-ID: <1990Oct12.202403.26793@xavax.com>
Date: 12 Oct 90 20:24:03 GMT
Organization: Xavax
Lines: 59

In article <TOM.90Oct8065144@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes:
:>>>>> Regarding Emulating other computers on 88K's and Benchmarks; newton@smoggy.gg.caltech.edu (Mike Newton) adds:
:newton> 	[2] The memory model, including wait states.  The lower end
:newton> 	    DG machines have a fair number of wait states -- a fact
:newton> 	    that surprised me, considering their memory is custom.
:
:Before you complain about memory wait states you should figure out where
:they are all coming from. A vast number of cycles are consumed by overhead
:in the 88200 MMU chip - as a rough example, if a certain configuration of
:memory and MMUs take 16 cycles to fill a cache line, 3 of those are the time
:it takes to walk through the data unit pipeline, 2 or 3 of the remaining
:cycles are the time it takes to access memory, and the remainder are
:consumed by the MMU. Even doubling the speed of memory would only reduce
:the 16 cycles to 14 or 15.

The speed of memory generally does only make a few cycles difference --
even at 33 MHz the difference between 60ns and 120ns RAMS is 3 cycles.
The time taken in the data pipeline is usually in parallel with the
execution of other instructions -- good pipelining by the compiler will
hide that time.  Also, this time is the same regardless of external
memory architecture, so I ignore it for the purposes of comparison.

:Please Note: The above figures are from my memory of one example we worked
:out in detail - there are A LOT of different types of memory cycles and
:paths through the MMU and this was one specific example we worked through (I
:seem to recall it was doing a load from a non-cached memory location). The
:specific figures quoted may be wrong, but the approximate percentage speed
:improvement from using faster memory chips is about right (in other words,
:barely significant :-).
:--
:======================================================================
:domain: tahorsley@csd.harris.com       USMail: Tom Horsley
:  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
:                                               Delray Beach, FL  33444
:+==== Censorship is the only form of Obscenity ======================+
:|     (Wait, I forgot government tobacco subsidies...)               |
:+====================================================================+

The AViiON desktop system (not sure of model numbers) fills a cache line
in 16 clock cycles.  The Topgun does the same in 11.  In the 88200 manual,
Motorola gives a circuit (although it has several problems, including
marginal timing) that does it in 8.  The theoretical minimum is 7, since
the 88200 takes 2 clocks to decide it needs to hit the bus, 1 address
phase and 4 data phases.  At 20 MHz with 60ns memory 2 way interleaved,
this can be achieved.  With 100ns chips, you can do it in 8.  If you need
to go across a bus to get to the memory or if the capacitance becomes a
problem, it will cost you another cycle.  The Motorola MVME181 board gets
a cache burst in 8 cycles, as does an Opus board.  I don't understand where
DG's time is going -- perhaps so they can sell faster systems?

Of course, these numbers all assume no translation delays, ie. TLB hits.

The AViiON numbers came from a technical person I talked to there (I
don't remember the name) when I was discussing pipelining optimizations.

-- 
John Tamplin						Xavax
jat@xavax.COM						2104 West Ferry Way
...!uunet!xavax!jat					Huntsville, AL 35801