Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!sri-spam!ames!amdcad!amd!intelca!mipos3!omepd!intelisc!littlei!ogcvax!pase From: pase@ogcvax.UUCP (Douglas M. Pase) Newsgroups: comp.arch Subject: Re: chewing up mips with graphics Message-ID: <1323@ogcvax.UUCP> Date: Thu, 25-Jun-87 17:17:02 EDT Article-I.D.: ogcvax.1323 Posted: Thu Jun 25 17:17:02 1987 Date-Received: Sat, 27-Jun-87 09:48:08 EDT References: <8270@amdahl.amdahl.com> <359@rocky2.UUCP> <6240@steinmetz.steinmetz.UUCP> <6328@beta.UUCP> <2120@dg_rtp.UUCP> Reply-To: pase@ogcvax.UUCP (Douglas M. Pase) Organization: Oregon Graduate Center, Beaverton, OR Lines: 87 In article <2120@dg_rtp.UUCP> wood@dg_rtp.UUCP (Tom Wood) writes: Personally, I believe the 90% solution to obtaining parallelism is to take advantage of multiple independent computations. (It's much easier to make 100 compiles go 100 times faster by using 100 machines than it is to make 1 machine go 100 times faster on each compile.) I hope I'm not missing the point, but I think you're off by 10% -- that is, the only approach to parallelism is, by definition, taking advantage of multiple independent computations. There's lots of levels to choose from, not just one. What you have mentioned here (with the 100 compiles) is parallelism at the process level. This approach is the easiest, the best understood, and many vendors provide commercial products which successfully take advantage of this type of parallelism. Honeywell has been doing this for years with CP-6, and Sequent and Apollo are two newer entries. (Oh, I see. I bet this level is what you meant by "independent" -- correct me if I still misunderstand.) The advantage of this level is that the overhead required to parallelize the computation is relatively small, and controlled by the system (eg in system locks and resource scheduling) - not introduced into the computation itself. The next level has multiple co-operating tasks, as in a producer/consumer relationship and similar approaches. At this level the overhead is built directly into the application. Sometimes the overhead required to distribute sufficient information to run multiple parallel tasks cancels any benefits that might have accrued. Keller's Readyflow system operates at this level. The Cray-X-MP, Sequent, Alliant, and some other shared memory machines can be operated at this level using some form of microtasking. All distributed memory machines (such as the Intel Hypercube and NCube's machine) are operated at this level. (By the way, "large grain dataflow" is at this level.) Another level is the instruction level. At this level, instructions are scheduled independently and in parallel. The MIT dataflow machine is an example of this. Operands accumulate in a "waiting-matching store" until all operands required by an operator have accumulated. At that time the operator and its operands are placed in a queue, and executed as soon as a processor becomes available. The Manchester dataflow machine works very similarly to the MIT machine. The Cray machines also take advantage of this level of parallelism. Perhaps the bottom level is the microcode level. Any machine (such as the DEC 8600 series) which pipelines its microcode is executing in parallel. The Goodyear MPP is the machine which offers the most parallelism at this level (although it's not exactly pipelined). It weighs in at 16K 8-bit processors. In article johnw@astroatc.UUCP (John F. Wardale) writes: This may be true, but for most "real" problems, some well know person determined that the average code spends 90% of its time executing 10% of its code. This and other related studys show that a large fraction of problems that have no, or very limited parrallelism. As you have stated the study, it does *not* support your conclusion. Tight loops in FORTRAN (shudder) programs may often be parallelized, by pipelining instructions. Kuck's work on parallelizing compilers have shown amazing improvements can be gained by pipelining and vectorizing DO loops. (Yes, both are a form of parallelism.) It is the data dependencies which determine the available parallelism, not the size of the code. A couple mounths ago there was some discussion of somebody's challenge (with a moderate cash prize) .... As I recall, you had to speed up a general problem (not limited to HIS problem set, but he could reject anything that was "embarrisingly parrallel" [like the 100 compiles example]) by a factor of 100, and you could use as many processors as you wanted. Did anyone save any of these? Has anyone won the prize yet? John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 This seems pretty easy to me. Any fluid modeling or simulation problem such as numerical weather forcasting, dynamic air-flow analysis, oceanographic simulation, planet/galaxy formation, gas dispersion, etc., would benefit a lot from just about any level of parallelism. If this problem set isn't sufficiently "real", how about finite-element analysis, or image processing? I would rather solve any of these problems on the MPP than any single 8-bit processor. -- Doug Pase -- ...ucbvax!tektronix!ogcvax!pase or pase@Oregon-Grad (CSNet)