Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site lanl.ARPA
Path: utzoo!watmath!clyde!bonnie!akgua!gatech!seismo!cmcl2!lanl!jlg
From: jlg@lanl.ARPA
Newsgroups: net.arch
Subject: Re: 128Mb - I give up!
Message-ID: <34416@lanl.ARPA>
Date: Fri, 6-Dec-85 20:57:40 EST
Article-I.D.: lanl.34416
Posted: Fri Dec  6 20:57:40 1985
Date-Received: Sun, 8-Dec-85 03:26:31 EST
References: <285@frog.UUCP> <34249@lanl.ARPA> <696@unc.unc.UUCP>
Reply-To: jlg@a.UUCP (Jim Giles)
Organization: Los Alamos National Laboratory
Lines: 49

> > Actually, I can't remember a time when the fastest machines on the
> > market had virtual memory.  Page swapping can, at best, improve
> > throughput (usually not).  Page swapping is almost guaranteed to degrade
> > turn-around of individual tasks.
>
> The Cyber 203 & 205 which can outperform the Crays on most good
> days do have virtual memory.

I guess by 'good days' you mean those days when the only code you run is
for very long vectors in highly vectorized code or is code that has been
VERY carefully optimized for Cybers.  I've seen a lot of benchmarks of
both machines (I work with several different vintages of Crays on a
daily basis - and most of the people I work with are interested in only
one thing - SPEED).  The Cyber does very well on specific kinds of
problems involving long vectors.  It also does reasonably on codes that
have been carefully tailored for Cyber machines (ie. standard benchmark
sets like the 'Livermore Loops').  The Cyber does consistently worse
than Crays for short vectors, scaler code and code that hasn't been
recoded for the specific machine - this includes most production codes
at most of the major labs.

The problem is that vector setup time on Cybers is enormous.  You are
right that the asymtotic speed of Cybers is faster than the older Crays,
but that is only for brief sputs of pure vector code.  This extreme
vector setup time means that short vectors don't run very fast at all
(ie. multiplying two 3x3 matrices is not very efficient on Cybers).
Long vectors, where the pipeline time dominates, run very fast indeed.
Real production codes have a heterogenous mix of vector lengths, as well
as a lot of inherently scaler code for which the Cyber doesn't compete
well at all.

Meanwhile, vector setup time on the Cray is always short and predictable
even for data that is not contiguous and (with the X/MP) even for gather
scatters.  This means that short vectors (which constitute a large
proportion of many codes) run nearly as efficiently as long vectors.
Generally, for most codes with heterogenous mixes of vector lengths,
older Crays run slightly faster than Cybers - new Crays (X/MPs, Cray II)
run much faster.

The virtual memory is actually a large part of the speed degradation.
In order to run vectors efficiently, the vector must not span page
boundaries.  This means that each new vector operation must have it's
data moved around in memory so that page faults don't occur from the
vector unit.  If Cybers had very large central memory, instead of
virtual memory it would almost certainly be a faster machine (and would
therefore compete better than it has).

J. Giles
Los Alamos