Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxb!mhuxn!mhuxm!mhuxj!houxm!whuxlm!akgua!sdcsvax!sdcrdcf!hplabs!hao!seismo!ut-sally!oakhill!davet From: davet@oakhill.UUCP (Dave Trissel) Newsgroups: net.arch Subject: Re: Re: Caltech's Cosmic Cube Message-ID: <333@oakhill.UUCP> Date: Fri, 8-Feb-85 04:50:31 EST Article-I.D.: oakhill.333 Posted: Fri Feb 8 04:50:31 1985 Date-Received: Wed, 13-Feb-85 02:16:21 EST Organization: Motorola Inc. Austin, Tx Lines: 95 >>Dec 27's Electronic Design makes reference to a 64-node parallel processor >>using 8086/87's having solved a high-order physics problem which, heretofore, >>folk had only had the temerity to try out on a Cray. >> I'm curious. Anyone know about this or know literature references? >A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II". >The architecture encourages (2^N)-node networks by making the maximum distance >between nodes to be N links; hence, "hypercube". I understand that different >configurations of the Mark II are being built, up to possibly 128-node. I think it is important to size up the claims made for the power of multiple microprocessors tied together in ANY configuration. First lets look at the raw power available. The 8086 at 10 Mhz (its highest rated speed) can do at most 1.25 million integer operations per second (thats 32-bit register to register ADD.) The 8087 performs ADD and SUBTRACT floating-points at 20 us a shot (MUL is around 30 and DIV is around 40) at its highest rated speed of 5 Mhz. (Lets be good guys and forget for the moment that the 8086 cannot run faster than the 8087 which means it must run at 5 Mhz which lowers its 32-bit integer add rate to .625 MIPS.) Now the CRAY runs (I am quoting from memory but I don't think that I'm going to be far off) scalar rates of 30 Megaflops and vector rates of over 80. At the scalar rate of 30 Megaflops and assuming no interconnect overhead or idle time penalties on all 8087s it would take about 600 8087s to match the floating-point power of a CRAY! Thats right --- 600! Even if the cube had an array of 64 8086/8087 pairs its power would only be about one tenth that of a CRAY. (Cost wise though, 600 8086/8087 pairs would only run about 200 grand - substantially cheaper than the CRAY.) Assuming the same 30 MIPS figure for the CRAY integer processing it would only take about 50 8086's (at 10 Mhz) to match the CRAY. Even though these are ballpark figures, I think the conclusion to be had is quite obvious. The cube does not approach the power of a CRAY. >The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020, >68881, 1-4mb } for delivery in 1987. For my purposes (massive discrete event >simulations) that begins to look interesting. I've heard claims that the >68020/68881 pair is faster than a VAX-11/780...can someone comment on this? Well true and false. At nonfloating-point operations the '020 runs from 20 percent to 80 percent faster than the 780. For floating-point (DEC gives out no timings) we figure the 780 is slightly faster for single precision, slightly slower for double and extended and moderately slower at transcendentals. So the result is that the MC68020/881 combination is from about the same to 80 percent faster than the VAX 11/780 depending upon what you are doing. Lets make the same ballpark comparison with the CRAY. Floating ADD/SUB is about 2.3 us on the MC68881. That still means you would need about 44 881s to match the power of the CRAY 30 Megaflops. This is a little more encouraging as fourty-four of something is more managable than 600 of something. The MC68020 runs 32-bit register to register operations at an impressive 8 MIPS, which would indicate that only four MC68020's would be needed to approach the integer power of a CRAY. (I am assuming a 30 MIP figure here for the CRAY. Corrections welcomed from those in the know. Sorry but my CRAY manual is in storage.) Fermii (sp?) Labs in Chicago have a serious proposal to build a CRAY power equivalent MC68020 multi-processor system. I have seen their prototype running on MC68000s and it along with the software they have developed is truely impressive. They are running ABSOFT FORTRAN on each node with a VAX 780 controlling the whole thing. However, thier nodes do not seem to be as closely coupled as those mentioned here about the Cube. I will post a synopsis of that machine if people are interested. >I've also heard a rumor that a major firm plans to market its own Intel >386-based hypercube. I don't know enough about the 386 performance or >schedule to know when this would be or whether the 68020 would be better. <<>> We at Motorola have heard rumors that it is in for its fourth redesign and that now the on-chip instruction cache is being abandoned. I fail to see how any high performance chip can be effective without an on-chip cache of some type. (The EDN benchmarks on the MC68020 show an over 25 percent improvement when the cache is turned on.) Intel's sales pitch may give a clue about the 386's status. It is a polished presentation which attempts to prove that you don't need 32-bits for anything, and that the MC68020 is overkill. >The problem of effectively using this computing power is non-trivial >(ask the folks with Illiac IV). ... > Joel West > CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037 > jww@bonnie.UUCP (ihnp4!bonnie!jww) > westjw@nosc.ARPA I would have agreed 100 percent with that statement before I saw the Fermii Lab demo. Now I'm not so sure. It may be non-trivial but now I don't think its too difficult to tackle either. Of course, all responses welcome. Motorola Semiconductor Inc. Dave Trissel Austin, Texas {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet