Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!crdgw1!sixhub!davidsen
From: davidsen@sixhub.UUCP (Wm E. Davidsen Jr)
Newsgroups: comp.unix.i386
Subject: Re: 386 Motherboards
Message-ID: <898@sixhub.UUCP>
Date: 2 May 90 22:37:54 GMT
References: <15966@cbnews.ATT.COM> <332@hub.cs.jmu.edu> <883@sixhub.UUCP> <292@zds-ux.UUCP>
Reply-To: davidsen@sixhub.UUCP (bill davidsen)
Organization: *IX Public Access UNIX, Schenectady NY
Lines: 52

In article <292@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
| In article <883@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) write| >In article <332@hub.cs.jmu.edu> arch@hub.cs.jmu.edu (Arch Harris) writes:

| >  I believe there are, but my personal experience is that over 64k you
| >hit diminishing returns (actually 32k does a lot).
| 
| This fits with some tables Intel published on various cache organizations
| and sizes, and with my experience.  However, the data also makes it clear
| that organization makes a big difference too, that is 32k 4way set-assoc.
| could end up about the same hit rates as 128k direct mapped (I don't have
| the data in front of me, but I remember that set-assoc. buys you quite
| a bit).  Also, keep in mind that those last few pecent of hit rate can
| make a big difference if the main memory is very slow.

  A good point. I stand clarified if not actually corrected. Something
which has 95% hit instead of 90% goes to main memory only half as much.
Now the question is, what does that cost in terms of wait states?
That's very machine dependent, but let's assume two. Then,

0.90 * 2w/s + 0.10 * 4w/s = 2.2w/s (effective)
0.95 * 2w/s + 0.05 * 4w/s = 2.1w/s

A gain of 5%. If you assume 6w/s (slow but not unheard of)

0.90 * 2w/s + 0.10 * 6w/s = 2.50
0.95 * 2w/s + 0.05 * 6w/s = 2.20

A gain of 12%. Finally, if you assume 6w/s and 16 bit memory (yecch)

0.90 * 2w/s + 0.10 * 12w/s = 3.00
0.95 * 2w/s + 0.05 * 12w/s = 2.50

A gain of 17%. If you make the main memory slow enough you can justify a
lot of cahce, and be making a good decision. I think the last case is
pretty unlikely, but I'm told that you can get 12 clock access with 32
bit memory on access to non-interleaved memory.

  The question then gets cloudier when you consider that doubling the
cache size doesn't double the hit rate (or more accurately halve the
miss rate) for most loads. So while it's probably true to say that more
cache or faster memory will always make your system faster, it's not
true that any kind of memory will always be the most cost effective
improvement to a system. Maybe adding an FPU or a better disk controller
will do more.

  People are still getting master's theses from this, so we're not going
to solve it in a screen or two.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me