Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!ihnp4!chinet!nucsrl!ram From: ram@nucsrl.UUCP Newsgroups: comp.arch Subject: Re: Hypercubes (place in life) Message-ID: <3810010@nucsrl.UUCP> Date: Wed, 4-Mar-87 03:11:31 EST Article-I.D.: nucsrl.3810010 Posted: Wed Mar 4 03:11:31 1987 Date-Received: Fri, 13-Mar-87 23:35:44 EST References: <362@ames.UUCP> Organization: Northwestern U, Evanston IL, USA Lines: 94 Eugene Wrote: >experience has been to the contrary. Heavy computing requires a well >thought out (balanced) structure to prevent things like an I/O bottleneck. >A hypercube is far from a typical end-user machine. Quite True: >The marketing hype which has surrounded hypercubes astounds me. It [Talk of marketing hype. The hype reminds of the AI field. "If you want to do serious work in AI you have to have a Symbolics or LMI or...". BS. I can do as well or better with a SUN. I have even heard interviewers telling me these. Do these guys start projects after reading the Ad pages of AI magazine?] Why doesn't anybody talk of any other network. I for one like the De Bruijn Interconnection network. Any flames/appreciation related to this? True hypercube is by far the best studied, easy VLSIable, extensible network. But is an interconnection network the be all and end all of multi-processors. No. It is far too early to judge that. Hypercubes/delta/banyans or whatever network you choose, I/O bandwidth, routing protocols, network connectivity would limit the number of problems that would run faster (relative to number of available processors) on these. I am not discounting cubes altogether (TMI guys have shown some interesting pieces of jugglery with cubes). I guess we can agree that problems that are communication bound are bound to have problems with any sort of network. Alternate Solution: Have enormous shared memory - Giga Giga Bytes. Here cache coherence is a major stumbling block. What classes of problems are more suitable for network based machines? If we assume that a problem is decomposed into a number of sub problems/ processes, problems that are embarassingly parallelizable and with little inter-process communication would be best for network class of machines. Heavy communication bound would be suited for Shared memory. As shared memory is not viable for more than single digit (optimistic) # of PEs what's the alternative? Solution: Mix these two within a framework (I think Cedar has such characteristics) so that a few PEs share a common Gigas of memory and such clusters are interconnected. It is wasteful to set up communication (broadcast is different issue altogether) to transfer from A to B for just a few KBs . In order to reduce the communication overheads with respect to the overall transfer time such a framework would be more suitable. This is probably what Eugene means as a balanced design. This has a few plus points. Fault-tolerance is improved, alongwith alternate communication channels. Another common misconception is about vectorization. Vectorization does not mean speed-ups for numerical calculations alone. Chaining, short-stopping, overlapping provide considerable speed-ups in the form of reduced memory access cycles. Till to-day only FORTRAN programmers had access to such machines (Probably the CRAYs were dedicated to this Saintly Sect.) and so the use. Although Vectorizing languages like 'C' is not as easy as FORTRAN, certainly vectorization is lot easier than auto-parallelization. [Gould had done some work (wonder what happened to it) to vectorizing C as well as Kuck & Associates]. I had done some research as part of a team in analysing a class of problems (ranging from bit manipulation, searching/sorting, tree manipulation to Fortran like number juggling). Disregarding the underlying architecture, the analyses were separated into parallelizable sections and vectorizable sections. Some problems were embarazzingly parallelizable and some had little amount of parallelization (A solution tree - a typical prolog search tree) but the amount of speed-up in vectorization is considerable in almost all problems. (No wonder there are so many vector CPU designs on the works today). If we build huge CPUs that crunch data at an alarming rate, communication latencies are going to limit their loading capabilities. If we build small CPUs that overlap communication with computation, effectively there is a speed-up (CM), but there is a limit to the CPU size and number which are dictated by the problems and interconnection complexity. Thus there is also a tradeoff between CPU power and interconnection type. In retrospect, choice of Intel chips for the caltech machine was probably not the best. Another problem for these network based multi-processors is the initial distribution of data and final collection. Almost everybody ignores them in the analysis, which I think is significant and have to be included. renu raman ....ihnp4!nucsrl!ram Northwestern Univ. Comp. Sci. Res. lab Evanston IL 60201 Thanks to Ollie, Iran has agreed to spend Zillions in supercomputer research. How philanthrophic of those guys. Why is that people who have used iPSC have either turned HYPER or are in a COSMIC trance :-)