Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!pp!yoda!tomlic
From: tomlic@yoda.ACA.MCC.COM (Chris Tomlinson)
Newsgroups: comp.arch
Subject: Re: parallel systems
Message-ID: <358@yoda.ACA.MCC.COM>
Date: 19 Oct 89 13:57:32 GMT
References: <20416@princeton.Princeton.EDU>
Organization: MCC, Austin, TX
Lines: 65

From article <20416@princeton.Princeton.EDU>, by mg@notecnirp.Princeton.EDU (Michael Golan):
> In article <7651@bunny.GTE.COM> hhd0@GTE.COM (Horace Dediu) writes:
>>
>>Consider the 8k processor NCUBE 2--"The World's Fastest Computer." 
>>(yes, one of those).  According to their literature:
>>"8,192 64 bit processors each equivalent to one VAX 780.  It delivers
>>60 billion instructions per second, 27 billion scalar FLOPS, exceeding the
> 
> This imply a VAX 780 is a 7 mips machine ?

The architecture of the processor is similar to the VAX ISA, not the performance.

> 
>>performance of any other currently available or recently announced
>>supercomputer."  It's distributed memory .5MB per processor, runs UNIX, 
>                                       ^^^^^^^^^^^^^^^^^^^^^^
>>and is a hypercube.
> 
> .5MB ? And this is faster than a Cray? How many problems you can't even

I understand that NCUBE makes provisions for up to 64MB per node on
those systems using the 64 bit processors. They also apparently have
incorporated a through-routing capability in the processors similar to
that found on the Symult mesh-connected machines.

> solve on this? And for how many, a 32Mb single VAX 780 will beat ?!
> One of the well known problems wtih Hypercubes is that if you look at a job
> that uses the whole memory (in this case 4Gb = Big Cray), a single machine 
> with the same performance of one processor (and all memory) will be almost 
> as good and sometimes even better.

The current trends in distributed memory MIMD machines are towards very
low communication latencies by comparison with the first generation
machines that used software routing and slow communication hardware.
This has a tendency to drive the machines more towards shared-memory
like access times, but of course physical limitations simply mean that
DM-MIMD machines are a scalable way of approximating shared-memory worse
and worse as the machine gets larger, but at least the machine can get
larger.

> 
> My original point was that MIMD, unless it has shared memory, is very hard
> to make use of with typical software/algorithms. Some problems can be solved
> nicely on a Hypercube, but most of them can not! And the state of the art

The state-of-the-art in parallel algorithm development is advancing rapidly
as machines become available to experiment on.  It is more of an issue of
algorithm design than paralyzing sequential codes.  There are quite a
number of problems that are tackled on Crays because of superior scalar
performance that do not make significant use of the SIMD vector capabilities.
I would point to the development of BLAS-2 and -3 as indications that even
on current supercomputers compiler technology just doesn't carry the day by
itself.

> in compilers, while having some luck with vectorized code, and less luck
> with shared memory code, has almost no luck with message-passing machines.
> 
> 
>  Michael Golan
>  mg@princeton.edu
> My opinions are my own. You are welcome not to like them.

Chris Tomlinson
tomlic@MCC.COM
--opinions....