Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!stat!stat.fsu.edu!mccalpin
From: mccalpin@masig3.ocean.fsu.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: ATTACK OF KILLER MICROS
Message-ID: <MCCALPIN.89Oct16141656@masig3.ocean.fsu.edu>
Date: 16 Oct 89 18:16:56 GMT
References: <35825@lll-winken.LLNL.GOV> <1081@m3.mfci.UUCP>
	<35896@lll-winken.LLNL.GOV>
Sender: news@stat.fsu.edu
Organization: Supercomputer Computations Research Institute
Lines: 54
In-reply-to: brooks@vette.llnl.gov's message of 15 Oct 89 18:20:48 GMT

In article <35896@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene
Brooks) writes:

>Microprocessor development is not ignoring vectorizable workloads.  The
>latest have fully pipeline floating point and are capable of pipelining
>several memory accesses.  As I noted, interleaving directly on the memory
>chip is trivial and memory chip makers will do it soon. [ ... more
> stuff deleted ... ]
>              They will do this with their commodity parts.

It is not at all clear to me that the memory bandwidth required for
running vector codes is going to be developed in commodity parts.  To
be specific, a single 64-bit vector pipe requires a sustained
bandwidth of 24 bytes per clock cycle.  Is an ordinary, garden-variety
commodity microprocessor going to be able to use 6 32-bit
words-per-cycle of memory bandwidth on non-vectorized code?  If not,
then there is a strong financial incentive not to include that excess
bandwidth in commodity products....

In addition, the engineering/cost trade-off between memory bandwidth
and memory latency will continue to exist for the "KILLER MICROS" as
it does for the current generation of supercomputers.  Some users will
be willing to sacrifice latency for bandwidth, and others will be
willing to do the opposite.  Economies of scale will not eliminate
this trade-off, except perhaps by eliminating the companies that take
the less profitable position (e.g. ETA).

>Supercomputers of the future will be scalable multiprocessors made of
>many hundreds to thousands of commodity microprocessors.  They will
>be commodity parts because these parts will be the fastest around and
>they will be cheap. 

It seems to me that the experience in the industry is that
general-purpose processors are not usually very effective in
parallel-processing applications.  There is certainly no guarantee
that the uniprocessors which are successful in the market will be
well-suited to the parallel supercomputer market -- which is not
likely to be a big enough market segment to have any control over what
processors are built....

The larger chip vendors are paying more attention to parallelism now,
but it appears to be in the context of 2-4 processor parallelism.  It
is not likely to be possible to make these chips work together in
configurations of 1000's with the application of "glue" chips....

This is not to mention the fact that software technology for these
parallel supercomputers is depressingly immature.  I think traditional
moderately parallel machines (e.g. Cray Y/MP-8) will be able to handle
existing scientific workloads better than 1000-processor parallel
machines for quite some time....
--
John D. McCalpin - mccalpin@masig1.ocean.fsu.edu
		   mccalpin@scri1.scri.fsu.edu
		   mccalpin@delocn.udel.edu