Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!xavax!jat From: jat@xavax.com (John Tamplin) Newsgroups: comp.sys.m88k Subject: Re: Emulating other computers on 88K's and Benchmarks Message-ID: <1990Oct12.202403.26793@xavax.com> Date: 12 Oct 90 20:24:03 GMT Organization: Xavax Lines: 59 In article tom@ssd.csd.harris.com (Tom Horsley) writes: :>>>>> Regarding Emulating other computers on 88K's and Benchmarks; newton@smoggy.gg.caltech.edu (Mike Newton) adds: :newton> [2] The memory model, including wait states. The lower end :newton> DG machines have a fair number of wait states -- a fact :newton> that surprised me, considering their memory is custom. : :Before you complain about memory wait states you should figure out where :they are all coming from. A vast number of cycles are consumed by overhead :in the 88200 MMU chip - as a rough example, if a certain configuration of :memory and MMUs take 16 cycles to fill a cache line, 3 of those are the time :it takes to walk through the data unit pipeline, 2 or 3 of the remaining :cycles are the time it takes to access memory, and the remainder are :consumed by the MMU. Even doubling the speed of memory would only reduce :the 16 cycles to 14 or 15. The speed of memory generally does only make a few cycles difference -- even at 33 MHz the difference between 60ns and 120ns RAMS is 3 cycles. The time taken in the data pipeline is usually in parallel with the execution of other instructions -- good pipelining by the compiler will hide that time. Also, this time is the same regardless of external memory architecture, so I ignore it for the purposes of comparison. :Please Note: The above figures are from my memory of one example we worked :out in detail - there are A LOT of different types of memory cycles and :paths through the MMU and this was one specific example we worked through (I :seem to recall it was doing a load from a non-cached memory location). The :specific figures quoted may be wrong, but the approximate percentage speed :improvement from using faster memory chips is about right (in other words, :barely significant :-). :-- :====================================================================== :domain: tahorsley@csd.harris.com USMail: Tom Horsley : uucp: ...!uunet!hcx1!tahorsley 511 Kingbird Circle : Delray Beach, FL 33444 :+==== Censorship is the only form of Obscenity ======================+ :| (Wait, I forgot government tobacco subsidies...) | :+====================================================================+ The AViiON desktop system (not sure of model numbers) fills a cache line in 16 clock cycles. The Topgun does the same in 11. In the 88200 manual, Motorola gives a circuit (although it has several problems, including marginal timing) that does it in 8. The theoretical minimum is 7, since the 88200 takes 2 clocks to decide it needs to hit the bus, 1 address phase and 4 data phases. At 20 MHz with 60ns memory 2 way interleaved, this can be achieved. With 100ns chips, you can do it in 8. If you need to go across a bus to get to the memory or if the capacitance becomes a problem, it will cost you another cycle. The Motorola MVME181 board gets a cache burst in 8 cycles, as does an Opus board. I don't understand where DG's time is going -- perhaps so they can sell faster systems? Of course, these numbers all assume no translation delays, ie. TLB hits. The AViiON numbers came from a technical person I talked to there (I don't remember the name) when I was discussing pipelining optimizations. -- John Tamplin Xavax jat@xavax.COM 2104 West Ferry Way ...!uunet!xavax!jat Huntsville, AL 35801