Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!uunet!dg!Publius
From: Publius@dg.dg.com (Publius)
Newsgroups: comp.arch
Subject: SPEC, SPECthruput
Message-ID: <428@dg.dg.com>
Date: 3 May 90 16:43:10 GMT
Reply-To: publius@dg-pag.webo.dg.com (Publius)
Distribution: na
Organization: Performance Analysis Group, Data General, Westboro, MA.
Lines: 49

SPECthruput is a performance measurement defined by SPEC.  Compared to
the well-known SPECmark, it represents one step further.  It recognizes
the importance of the multiprocessor systems in the marketplace.
In many real world environments, throughput has more significance than
the elapse time, and thus SPECthruput is the better performance measurement
than the SPECmark.

However, in its present form, SPECthruput has a few shortcomings.  These
shortcomings should be fixed before SPECthruput can live up to its
good and noble intention.

The first problem I can see in the current SPECthruput methodology is that
it does not specify the maximum time slice allowed.  As we all understand,
the larger the time slice is, the less the context switch overhead will be.
The context switch overhead here includes not only saving and restoring
the process context, but also the warming up of the caches and the
address translation table.  Without specifying the maximum time slice
would allows vendors to inflat SPECthruput numbers by setting a large
maximum time slice that a real world application environment can not accept.
 
The second problem is that what the current SPECthruput methodology
measures is the BATCH THROUGHPUT, not the throughput in a time-shared
environment.  This has at least two implications.  One is about
job scheduling.  The other is about cache utilization.

Concerning job scheduling, there is a fundamental difference between
a batch processing environment and a time-shared environment, especially
on a multiprocessor system.  In a batch processing enviroment, especially
if all the jobs take about the same processing time (as in the case of
SPECthruput methodology), the OS can assign each job a "preferred processor"
and have a job run on only one processor, and enhances the throughput
as the result of the reduced burden for keeping caches coherent.
In a time-shared environment, processes come and go in a random manner,
and the "preferred processor" scheme mentioned above won't work as well.

Concerning cache utilization, there is also a difference in characteristics.
In a heavily loaded time-shared system, the physical memory tends to get
fragmented.  It is well-known that fragmentation of physical memory
can result in inefficient utilization of direct-mapped cache when the
cache size is larger than the page size (or more accurately, the cluster size
in memory management).

SPEC has done remarkable things.  Let us keep improving.

-- 
Disclaimer: I speak (and write) only for myself, not my employer.

Publius     "Old federalists never die, they simply change their names."
publius@dg-pag.webo.dg.com