Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!ihnp4!chinet!nucsrl!ram
From: ram@nucsrl.UUCP
Newsgroups: comp.arch
Subject: Re: Hypercubes (place in life)
Message-ID: <3810010@nucsrl.UUCP>
Date: Wed, 4-Mar-87 03:11:31 EST
Article-I.D.: nucsrl.3810010
Posted: Wed Mar  4 03:11:31 1987
Date-Received: Fri, 13-Mar-87 23:35:44 EST
References: <362@ames.UUCP>
Organization: Northwestern U, Evanston IL, USA
Lines: 94


Eugene Wrote:

>experience has been to the contrary.  Heavy computing requires a well
>thought out (balanced) structure to prevent things like an I/O bottleneck.
>A hypercube is far from a typical end-user machine.

  Quite True:

>The marketing hype which has surrounded hypercubes astounds me.  It

  [Talk of marketing hype. The hype reminds of the AI field.  "If you want
to do serious work in AI you have to have a Symbolics or LMI or...". BS.
I can do as well or better with a SUN.  I have even heard interviewers
telling me these.  Do these guys start projects after reading the Ad pages
of AI magazine?] 

  Why doesn't anybody talk of any other network.  I for one like the De Bruijn
Interconnection network.  Any flames/appreciation related to this?
True hypercube is by far the best studied, easy VLSIable, extensible network.
But is an interconnection network the be all and end all of multi-processors. 
No. It is far too early to judge that.

   Hypercubes/delta/banyans or whatever network you choose, I/O bandwidth,
routing protocols, network connectivity would limit the number of
problems that would run faster (relative to number of available processors)
on these.  I am not discounting cubes altogether (TMI guys have shown some
interesting pieces of jugglery with cubes).
I guess we can agree that problems that are communication bound are
bound to have problems with any sort of network.  Alternate Solution:
Have enormous shared memory - Giga Giga Bytes. Here cache coherence
is a major stumbling block.  

   What classes of problems are more suitable for network based machines?
If we assume that a problem is decomposed into a number of sub problems/
processes, problems that are embarassingly parallelizable and with little
inter-process communication would be best for network class of machines.
Heavy communication bound would be suited for Shared memory.  As shared
memory is not viable for more than single digit (optimistic) # of PEs what's
the alternative?

   Solution: Mix these two within a framework (I think Cedar has such
characteristics) so that a few PEs share a common Gigas of memory and such
clusters are interconnected.  It is wasteful to set up communication
(broadcast is different issue altogether) to transfer from A to B for just
a few KBs .  In order to reduce the communication overheads with respect
to the overall transfer time such a framework would be more suitable.  
This is probably what Eugene means as a balanced design.  This has a few plus
points.  Fault-tolerance is improved, alongwith alternate communication
channels. 

   Another common misconception is about vectorization.  Vectorization does
not mean speed-ups for numerical calculations alone.  
Chaining, short-stopping, overlapping provide considerable speed-ups in the
form of reduced memory access cycles.  Till to-day only FORTRAN programmers
had access to such machines (Probably the CRAYs were dedicated to this Saintly
Sect.) and so the use.  Although Vectorizing languages like 'C' is not as easy
as FORTRAN, certainly vectorization is lot easier than auto-parallelization. 
[Gould had done some work (wonder what happened to it) to vectorizing C as
well as Kuck & Associates].  
    
   I had done some research as part of a team in analysing a class of
problems (ranging from bit manipulation, searching/sorting, tree
manipulation to Fortran like number juggling).
Disregarding the underlying architecture, the analyses were separated
into parallelizable sections and vectorizable sections.  Some problems
were embarazzingly parallelizable and some had little amount of
parallelization (A solution tree - a typical prolog search tree)
but the amount of speed-up in vectorization is considerable in almost all
problems. (No wonder there are so many vector CPU designs on the works today).

   If we build huge CPUs that crunch data at an alarming rate, communication 
latencies are going to limit their loading capabilities.  If we build small CPUs
that overlap communication with computation, effectively there is a speed-up
(CM), but there is a limit to the CPU size and number which are dictated
by the problems and interconnection complexity.  Thus there is also
a tradeoff between CPU power and interconnection type.  In retrospect, 
choice of Intel chips for the caltech machine was probably not the best.
Another problem for these network based multi-processors is the initial
distribution of data and final collection.  Almost everybody ignores them in
the analysis, which I think is significant and have to be included.


				      renu raman
				....ihnp4!nucsrl!ram
				Northwestern Univ. Comp. Sci. Res. lab
				   Evanston  IL  60201


Thanks to Ollie, Iran has agreed to spend Zillions in supercomputer research.
How philanthrophic of those guys.

Why is that people who have used iPSC have either turned HYPER or are in a 
COSMIC trance :-)