Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!rutgers!ames!pioneer!lamaster
From: lamaster@pioneer.arpa (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: What with these Vector's anyways?
Message-ID: <2425@ames.arpa>
Date: Fri, 31-Jul-87 13:30:49 EDT
Article-I.D.: ames.2425
Posted: Fri Jul 31 13:30:49 1987
Date-Received: Sun, 2-Aug-87 01:07:47 EDT
References: <218@astra.necisa.oz> <142700010@tiger.UUCP>
Sender: usenet@ames.arpa
Reply-To: lamaster@ames.UUCP (Hugh LaMaster)
Distribution: world
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 95
Keywords: scalar vs. vectors, benchmarks, Dhrystone, sorting

In article <279@diab.UUCP> pf@.UUCP (Per Fogelstrom) writes:
>In article <10956@amdahl.amdahl.com> littauer@amdahl.UUCP (Tom Littauer) writes:
>>In article <3636@well.UUCP> rchrd@well.UUCP (Richard Friedman) writes:
>>>The best supercomputers are fast scalar machines first, with vector
(omitted discussion about scalar perf. in supercomputers)
>
>Be serious! Do you really belive in Dhrystone? Okay i do admitt that we don't
>have anything better for the moment, but soon i hope.
(omitted discussion about Dhrystone)
>
>[ There are three types of lie, lie, damned lie, and BENCHMARKS !! ]
>-- 
>Per Fogelstrom,  Diab Data AB

I have to add something else to this discussion.  Ten years ago, when Crays
first came out, IBM was still trying to peddle the 370/168 and Amdahl had its
first faster machines.  Folks at the national labs started saying that the
Cray was not only fastest, but even also most cost effective, for "scalar"
work.  They were right, at the time.  A lot of water has gone under the bridge
since then.  There wasn't much of a market for fast machines in the early and
mid 70's, but the last four or five years have changed all that.  Even IBM is
trying to keep up.  But, to get back to the question of scalar performance:

Suppose you want to buy the most cost effective machine for doing large sorts.
Ten years ago, that might have been a Cray.  Parallel Computing (Vol 4 1987 pp
49-61) recently had a comparison of sorting performance using scalar and
vector algorithms on big iron.  The scalar performance of some non-Cray big
machines has now caught up with Cray scalar performance (scalar Quicksort
being a good example of a scalar code).  Vector processors are being
incorporated in more "mainstream" mainframes (Amdahl 1200 examined in the
article, but also the IBM 3090 VF machines).  And there are now vectorized
sorting algorthms which can provide significant benefits for some cases.
Overall, for sorting the Amdahl 1200 appeared to have the advantage for scalar
and vector sorting over the Cray X-MP.

There are several points here.  The first is that as more companies are
building fast machines and vector architectures have become mainstream, the
members of the set "supercomputers" are a bit harder to define (again) than
they were ten years ago.  Even for traditional "business" "scalar" computing
like sorting, there are now vector algorithms which show significant
performance improvements over scalar algorithms.  

Finally, the question of what makes a good benchmark:  

If you want to do a lot of sorting, sorting makes a good benchmark.
(Extrapolate to whatever you want to do).

The original purpose of Dhrystone was to produce a synthetic program that used
"recent statistics" for "real" programs.  Weicker's PROGRAM has been widely
criticized, but the STATISTICS behind it are probably valid for records and
pointers type code.  A new implementation of the code which prints results
which depend on the correct execution of all the code is certainly needed -
Dhrystone II?.  

A problem with "small" benchmarks which depend on multiple passes over the
same data is that typically code and data can run cache contained, which is
also very artificial.  A new Dhrystone III benchmark which uses the same
statistics but has a much larger data area would be more appropriate for
testing big machines with lots of cache and memory.  

It should be noted that one thing that Dhrystone does do "right" is make lots
of procedure calls.  In my experience, on typical machines that are similar in
other respects, it is often the cost of procedure calls, comparisons, and
branches that determine the "apparent speed" of a scalar machine used for
scalar purposes.  The reason Dhrystone looks SO slow on the Cray is very
likely due to the relatively much larger cost of procedure calls on the Cray
(and the CDC 6600, CDC 7600, Cyber 205, to name a few popular supercomputers).
This effect is real and is a valid result of Dhrystone, as long as the
compiler doesn't do true global optimization.

MIPS computers, and others, have been tending to use Un*x utilities to measure
the "general purpose" speed of machines.  This makes sense: it should be
remembered, however, with respect to Dhrystone, that when it and previous
benchmarks were written there was no standard environment available for most
processors as there is today.  Some (not on this net) still argue about it
today.


  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov


                 "IBM will have it soon"


(Disclaimer: "All opinions solely the author's responsibility")