Path: utzoo!censor!geac!maccs!cs4g6ag
From: cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn)
Newsgroups: comp.sys.ibm.pc
Subject: Re: SUMMARY: ... best way to determine cache size (LONG!)
Keywords: cache size ramdisk
Message-ID: <25E956F9.934@maccs.dcss.mcmaster.ca>
Date: 26 Feb 90 16:19:05 GMT
References: <2559@leah.Albany.Edu>
Reply-To: cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn)
Distribution: usa
Organization: McMaster University, Hamilton, Ontario
Lines: 53

In article <2559@leah.Albany.Edu> emb978@leah.Albany.Edu (Eric M. Boehm) writes:
$Included below are the responses I received to my question "What is the
$best way to determine cache size". 
[...]
$There are two viewpoints on caching:  One holds that a cache is like a
$special, high-speed RAMdisk that uses virtual memory to look huge.  From
$that viewpoint, you should go to a 3 or 4 Meg cache.  You might get very
$fast performance, since entire applications will be buffered onto the cache.
$(I'd love to do some C compilers with a cache that size.)  The other argument
$is that a cache is for repeated reads of one or two files, and should
$therefore be about as large as the largest file (or application) you will use.

   Well, the "ramdisk" view is pretty accurate _if_ you're only reading files
and not writing them, but once you start writing, a ramdisk will be so much
faster since it only involves a write to memory, while a disk cache requires
a write to the disk as well.  Almost all caches are write-through designs, which
means that as soon as you write something, it goes onto disk immediately.
The only exception to this that I know of is Super PC-Kwik, which is a
write-back design that will perform background writes while you're doing
something else with your PC.

   The repeated reads argument also has some validity, especially as it
applies to the directory and file allocation information if you're using
several files and opening and closing them a fair bit.

$From: jdudeck@polyslo.CalPoly.EDU (John R. Dudeck)
$If you want to try to do measurements, you probably will never get anything
$conclusive.  The reason for this is the way in which a cache works.  It holds
$the disk data that has been read, in the hopes that you will try to read the
$same data again, thus removing the need for a repeated disk access to that
$data.  If you keep reading the same data over and over, the cache will
$always get a hit, and you will have fantastic speed improvements over no
$cache at all.  If you read different data each time you read, there will
$be no improvement at all.  It all depends on how much your work rereads
$the same data without intervening reads of other data.  This can be
$determined easier on the back of an envelope than by doing benchmark
$measurements!

   So the point is that benchmark-type activity is meaningless, and I
agree.  So what you do is set up a disk cache and then do whatever you
normally do with the machine, and have a look at the cache statistics
at the end of the day.  Or, for smaller-scale timing, set up a cache and
then compile and link a program of whatever the typical size you use is
and see how long it takes without the cache or with various size caches.
If you keep in mind how a cache works, you can effectively test to see
how much of a performance difference it makes ... it's only if you
try doing the same thing over and over again that your results get
screwy.
-- 
Stephen M. Dunn                               cs4g6ag@maccs.dcss.mcmaster.ca
          <std_disclaimer.h> = "\nI'm only an undergraduate!!!\n";
****************************************************************************
               I Think I'm Going Bald - Caress of Steel, Rush