Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!uunet!telesoft!choll
From: choll@telesoft.com (Chris Holl @adonna)
Newsgroups: comp.benchmarks
Subject: Re: bc benchmark [sigh]
Summary: QBM - Boeing Capacity Benchmark.
Message-ID: <1148@telesoft.com>
Date: 31 Dec 90 20:44:35 GMT
References: <44342@mips.mips.COM> <15379@ogicse.ogi.edu> <1142@telesoft.com> <44371@mips.mips.COM>
Organization: TeleSoft, San Diego, CA.
Lines: 170


>From: borasky@ogicse.ogi.edu (M. Edward Borasky)

> I was hoping for a response like this.  There are two types of computer
> users -- those like you and me who realize that computing costs money,
> is a resource that must and can be managed, and those like students,
> computer science faculty, dreamer/architects who think that computing
> should be, can be and often is essentially free.  

Boeing Computer Services occasionally received criticism of their
"high" bills for computing services.  In response they produced a paper
called "The Real Cost of Computing" to educate their customers.  It
described many of the things you mentioned including support of
the hardware, software, configuration, backups, etc.  I don't know if
the paper is available, but I could find out if there is interest.

> I'll bet that the COST difference between the two machines was such
> that you could afford to give away the extra speed on the X-MP from
> the hardware scatter/gather -- bill the X-MP as if it were strictly
> 1.32 times the Cray 1.

That's exactly what we wound up doing.  We couldn't justify a larger
figure because if a job ran on the X that didn't use scatter/gather it
would get a higher bill than it would have on the S, and that wouldn't
do.  Using 1.32 a job that used scatter/gather simply got a better deal
on the X.  We made a point of telling users this (so they got the
proper perspective :-) and to encourage them to take advantage of the
hardware.  If their jobs became more efficient, throughput would go up.

The algorithm for billing was slightly different however.  The 1-S had
a single processor and 2 Meg while the X-MP had 2 processors and 4 Meg. 
We billed for CPU seconds and memory residency.  When a job started
using more than 2 Meg it started to pay a percentage of the other
processor, even if it wasn't using it.  If a job used the entire 4 Meg
it payed for both processors, because no one else could use the other
processor without occuping memory.

BCS took the 1-S out of the configuration after users had migrated to
the X, so after the overlap period everyone got a better deal.  Now they
have a Y-MP.  I wasn't there for that transition, so I don't know
exactly how they managed it.

>  You just said the secret word -- "capacity planning"!  I wish the duck
>  were still around to drop down and give you fifty dollars!  

Duck?  $50?

> >(Dr. Howard "Doc" Schemising - wonderful guy) developed a capacity 
> >test that would precisely model the current workload on a machine.  
> >This was used with great accuracy to measure the capacity of different 
> >machine (for that workload).  
> 
> Is this published?  Could you post it?  The guys here and in "comp.arch"
> would LOVE to see it!  

In article <44371@mips.mips.COM>, mash@mips.COM (John Mashey) writes:

> Yes, it would be good to see this.  Note the important fact that there are
> two steps:
> 	a) Characterizing the workload
> 	b) Predicting the performance on that workload

> Also, I've seen some pretty good benchmarks, with workloads tailored to
> different departments within a company ... unfortunately, the best ones
> I've seen were all proprietary...

Yes, step a) is critical.  And yes, Doc Schmeising's benchmark is
proprietary.  (Sorry I typo-ed his name wrong the first time.)  I have
had a few requests for more information on this capacity benchmark,
and I don't think the following violates any of BCS' rights.

Doc Schmeising's benchmark was called QBM (Quick BenchMark) and is
owned by Boeing Computer Services (BCS).  There was talk at one time of
marketing it, but they never did.  A shame, because it is a great tool.
Doc has retired and I'm not there any more and I don't even know if
they are still using it.  

The basic premise is to take a "slice" of your system's workload that
runs in some fixed period of time (we used 10 minutes - Quick) and dump
it into another system to see how long it takes.  If the work can't
complete in 10 minutes, the target system has less throughput than the
base system.  If it can complete in 10 minutes the target system has
equal or greater throughput.

The tricky part is to quantify throughput, and this was QBM's real
strength.  

BCS collected data on a variety of resources used by jobs.  This
included thing like

     .  CPU seconds burned
     .  Amount of memory used
     .  Duration of memory residency
     .  Disk blocks transfered
     .  Disk accesses

and a few others.  This data was stored on tapes and went back years.
It was collected for CDC Cybers and Crays (the two main workhorses at
BCS).  You, as the performance guru and benchmarker, would have to
select a 10 minute window where your machine was "full."  Full does not
mean the system was on its knees and response time was horrible.  It
means processing a reasonable workload with acceptable response time,
good CPU usage, no thrashing, etc.  I would pick a period of a busy
day; say, a busy hour, or 2 hours, or 30 minutes, or whatever, and feed
it to QBM.  QBM would filter out noise, and select 10 minute samples
from the time period.  It would select as many as you asked for (since
the 10 minute slices could start on any fraction of a second) and print
out a variety of statistics about the sample.  A GREAT DEAL OF CARE WAS
TAKEN TO PICK GOOD DATA.  A lot of accounting and performance data was
reviewed, and eventually one was picked and that was called your
baseline.  That represented the "full" capacity of your base machine.
This selection process was done once or twice a year.  Only when you
felt the work profile (type of work being done) had changed enough to
affect your results.  This would happen.  For example, when users
migrated from a Cyber to a Cray eventually they would start writing
code to take better advantage of vectorization.

QBM would then take the data from that sample and produce a synthetic
workload consisting of the same number of jobs, starting and stoping
at the same times during the 10 minute window (very important) and
using the same resources.

Some assumptions were made about how the CPU seconds were used (matrix
reductions?  straight line code?  etc.) and how they were distributed
during the job.  You can't use all the CPU seconds and then do all the
I/O.  On the other hand, an even distribution is not representitave
either.  A variety of CPU kernals were used (the 10 to 14 I mentioned
in my first posting) and given different weightings dependent on what
we thought our machines were being used for.

Now here's the real beauty of QBM:  Not only could it create a
workload that was the same as your real-life sample, it could create a
workload that was 1.5 times that sample.  Or 5 times, or 0.5 times.
You could now create any multiple of that workload.

A lot of testing went into QBM to be sure that the jobs it created
would actually run in exactly 10 minutes on the base machine.  It is
much to Doc's credit that they did.  A QBM workload of 1.0 ran in 10
minutes.  If 1.1 ran in 10 minutes, then the sample you picked wasn't
really when your machine was full.  After some experience with it, we
had samples were 0.9 would complete in 10 minutes, 1.0 would also, but
1.1 would not.

We would then take these samples to a new machine and scale the load up
or down until the jobs JUST ran in 10 minutes.  In this way we could
report that "For our current workload AND configuration, the X-MP will
provide 3.85 times the throughput of the 1-S."  (This was our actual
result.)  Of course the configurations of both base and target system
had to be taken into account.

This is significant also, because we could benchmark different
configurations on the same machine.  Suppose we had more channels?
Fewer channels and faster disks?  More disks?  All these questions
could be answered by setting up test configuations and measuring the
throughput.

No papers were written about QBM (unfortunately) althought I did
present benchmarking results at a couple CUGs (Cray Users Group) and
those papers are available.

Hope this has helped,

Chris.


Christopher Holl
TeleSoft
5959 Cornerstone Ct. W.
San Diego, CA  92121
(619) 457-2700