Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uunet!convex!convex1.convex.com!hamrick
From: hamrick@convex1.convex.com (Ed Hamrick)
Newsgroups: comp.arch
Subject: Re: Killer Micros and vectorized code
Summary: Surviving the attack of the killer micros
Message-ID: <100598@convex.convex.com>
Date: 15 Mar 90 00:24:35 GMT
References: <51771@lll-winken.LLNL.GOV>
Sender: news@convex.com
Organization: Convex Computer Corporation, Seattle, Washington
Lines: 148

Mr. Brooks,

I've greatly enjoyed the articles you've written regarding the performance
of "Killer Micros" relative to larger, more costly machines.  Even though
there are exceptions to any general rule, I agree with much of what you've
been saying, but must disagree with the overall conclusion.  The key
generalizations that I agree with are:

(1) The price/performance ratio of a wide range of applications is better
    on smaller machines than larger machines.  This applies primarily
    to applications dominated by scalar code that aren't amenable to
    vectorization or massive parallelism.  This is particularly applicable
    if applications have a locality of reference that can make effective
    use of high-speed cache.

(2) The price per megabyte of disk storage is better for lower-speed and
    lower-density disk drives.

(3) The price per megabyte of memory is better when memory is slower and
    interleaved less.

Many people will argue with all of these generalizations by citing specific
counter-examples, but I believe reasonable people would agree that these
generalizations have some merit.  I also believe that these generalizations
have been valid only in the past five years, and that there have been times
in the past that the opposite has been true.

The conclusion you've reached, and that I must admit I have been tempted to
reach myself over the past few years, is that "No one will survive the
attack of the killer micros!".  As a number of people have pointed out, there
are many factors counterbalancing the price/performance advantage of
smaller systems.  One of the key counter-arguments that a number of people have
made is that machines ought to be judged on price per productivity improvement.
A faster machine gives people higher productivity because of less time
wasted waiting for jobs, and more design cycles that can be performed
in a given time.  Anything that decreases time-to-market or improves
product quality is worth intrinsically more.  This is one of the traditional
justifications for supercomputers.  You noted that a Cray CPU-hour costs
significantly more than people earn per hour, but this doesn't take
into account that companies can significantly improve their time-to-market
and product quality with faster machines, albeit machines that cost more
per unit of useful work.  This may not matter in some application areas
such as computational physics, but a company like Boeing or McDonnell
Douglas can lose billions of dollars if they are six months late with
getting new products designed.  There are also significant cost multipliers
involved in producing a better product - for instance a small increase
in airplane fuel efficiency can result in significantly larger market
share than your competition.  Some people have noted that some companies
are willing to pay almost anything to get the fastest computers, and this
is one of the underlying economic reasons for this willingness.

Big companies and government labs tend to use this rationale to justify
procuring computers based on single-job performance.  However, when you
visit these facilities, generally large Cray sites, the machines are generally
used as large timesharing facilities.  People are finding that machines that
were procured to run large jobs in hours are instead running small jobs in
days.  Further inflaming the problem of having 500 users on a supercomputer is
the tendency of these companies and labs to make the use of these machines
"free".  (Just in passing I'd like to note that the direct result of making
CPU time on Crays "free" is that 90% of the CPU cycles get used by 10% of the
users, which can hurt time-to-market and reduce productivity.  Charging for
CPU time causes a vicious feedback loop where fewer users cause higher costs
which in turn cause fewer users, etc.  The Share Scheduler fixes much of this.)

I've felt for some time that there are fundamental reasons that large
computer system makers are still surviving, and in the case of CONVEX, growing
and prospering.  Even though the argument is made that faster machines improve
time-to-market, they are almost always used as timesharing systems, often
giving no better job turn-around time than workstations.  Some companies are
surviving because of the immense base of existing applications.  Some companies
prosper because of good customer service, some by finding vertical market
segments to dominate.  Every company has unique, non-architectural ways of
marketing products that may not have the best price/performance ratio.

However, I believe that there are several key strategic reasons that larger,
centralized/departmentalized computer systems will in the long run prevail
over the killer micros:

(1) A single computer user usually consumes CPU cycles irregularly.  A user
    often will have short periods of intense computer activity, followed by
    long periods of low utilization.  I've analyzed almost a years worth of
    data from a typical engineering computer system (more than 500,000 data
    samples), and have seen that the number of jobs an individual (or group
    of individuals) runs at a time approximates a Poisson distribution.
    This matches what one would expect intuitively - that even heavily
    loaded systems have some percentage of their CPU cycles that go to the
    null process.  If J is the average number of jobs a person runs at any
    given time, then EXP(-J) is the percentage of wasted CPU cycles on a
    single-user system.  For instance, if someone is performing a task where
    they are running 4 jobs at a time on average (sometimes 6, sometimes 2),
    then the workstation they are using will have EXP(-4) or 2% wasted cycles.
    Similarly, if there is an average of 1 job at a time, there will be 36%
    wasted cycles, and 0.25 jobs results in 78% wasted cycles.  I would
    maintain that the average number of runnable jobs on workstations is less
    than 0.1, resulting in greater than 90% wasted CPU cycles.  This statistical
    character of workloads provides strong economic incentives to people to
    pool their resources and purchase departmentalized/centralized computer
    resources.  A group of 20 people using a single machine will result in
    14% idle CPU time compared with 90% idle CPU time if they use 20
    workstations (assuming each user runs an average of 0.1 jobs at a time).
    This gives a factor of 10 advantage in usable price/performance to the
    centralized/departmentalized machine.

(2) The argument for the centralization/departmentalization of disk resources
    closely parallels the argument for CPU resources.  If each user is given
    dedicated disks on workstations, then significant amounts of total disk
    space and total disk bandwidth goes to waste.  There is significant
    economic incentive to centralizing/departmentalizing disk storage for
    this reason, as well as other reasons relating to data security and
    data archiving.

(3) I would maintain that the amount of memory needed by a job is roughly
    proportional to the amount of CPU time needed to run the job.  This is
    a very imprecise correlation, but is true to some degree across a wide
    range of problems.  I would also maintain that if an N-Megabyte program
    takes M seconds to run in N megabytes of physical memory, then it will
    take approximately 6*M seconds to run in N/2 megabytes of physical memory.
    This factor of 6 performance degradation holds true for a wide range of
    large memory application programs.  This gives a strong economic incentive
    to users to centralize/departmentalize their memory, and run large memory
    jobs in series.  For instance, assume two workstation users each have
    64 MBytes of memory and need to run 128 MByte jobs.  Assume these jobs
    take 12 hours apiece when run in 64 MBytes.  If the two workstation users
    put all 128 MBytes of memory on one workstation, and junked the second
    workstation, they could get both jobs done in 4 hours (2 hours per job)
    by running the two jobs in series on the large-memory workstation.  There
    is an additional economic incentive to centralizing memory that comes from
    the statistical nature of memory utilization by a group of users.  Using
    similar arguments to (1) above, you can easily show that a computing
    architecture with centralized/departmentalized high-speed memory is much
    more cost effective than distributing memory across multiple workstations.

Obviously, there is much more involved in selecting the optimal computing
architecture for a given workload.  Just as I disagree with you that simple
measures of price/performance will predict the success or demise of a product,
many people would probably maintain that my arguments about centralizing
compute/disk/memory resources are also simplistic.  There are many counter
arguments favoring distributed computing solutions, and many more arguments
favoring centralization.  The main point I wanted to make in this note is
that simple price/performance measures are poor predictors of the long-term
viability of a company's products.  I'm sure that most readers of this
newsgroup could post a long list of companies that had/have excellent
price/performance but that are/will be out of business.

Regards,
Ed Hamrick  (hamrick@convex.com)
Area Systems Engineer
CONVEX Computer Corporation