Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!uwvax!astroatc!johnw From: johnw@astroatc.UUCP (John F. Wardale) Newsgroups: comp.arch Subject: Re: chewing up mips with graphics [parallel computing] Message-ID: <338@astroatc.UUCP> Date: Thu, 25-Jun-87 17:38:30 EDT Article-I.D.: astroatc.338 Posted: Thu Jun 25 17:38:30 1987 Date-Received: Sat, 27-Jun-87 04:00:09 EDT References: <8270@amdahl.amdahl.com> <359@rocky2.UUCP> Reply-To: johnw@astroatc.UUCP (John F. Wardale) Organization: Astronautics Technology Cntr, Madison, WI Lines: 77 Keywords: parrallel computing; performacne; vector computing In article <5793@think.UUCP> bradley@godot.think.com.UUCP (Bradley Kuszmaul) writes: >In article <337@astroatc.UUCP> I (John F. Wardale) wrote: >>... >Actually, I interpret the 90%-10% rule as indicating that there might be >a lot of parallelism in problems. Since 90% of the time is spent >executing a very small amount of code, it seems likely that there is a >lot of data involved. It is also likely that the dependencies between >data are something smaller than "completely connected", and so it is >likely that different parts of the data can be processed in parallel. Comments: 1) The 90-10 was not well explain 2) the lots of data I agree with 3) the lack of dependencies does *NOT* follow! re: 1: If you machine parallelize, you get your gains from a small section of code. If you hand parallelize, you have to KNOW which 10% to do. Look at the theoretic (i.e. guarenteed not to exceed) performances of vector machines (like a Cray) and the actual performance on real code. (Try the Linpak benchmark if you believe in benchmarks.) While is *IS* true that there *ARE* problems that get 50+% of top, most tend to get *MUCH* less. Why? Because scalar performance frequently dominates. re: 3: When scheduling code for pipelined computers, given real programs several studies (sorry, no ref's) have shown that the number of registers needed for complete register allocation was in the twenties. (i.e. 16 is pretty good, 32 is over-kill). --> pipelining is parallelism, (tho admittedly limited) so why should we believe that multi-cpus can extract more parallelism than this? Codes blocks can be classes as: 1: parallelizable and vectorizable [example: zero an array] 2: parallelizable [example: a=5; b=6] 3: vectorizable 4: neither An example of #4 is an orbital dif-eq-shooting problem you have initial conditions, and a set of differential equations. In a loop you; increment time a little bit, calculate [sequentially dependent] values for acceleration, velocity, and position end-loop If anyone can vectorize or parallelize this, call me, I'll split the profits with you. :-) > The lesson is "Rules of thumb, such as Amdahl's Law, don't > have much to say about parallel computing". Personally, I think such rules WILL apply to parallel computations, and also be used to crudly estimate how much parallelism be extracted from average codes. Personally I think it will be a one or low-two digit number. As far as >1000 processor machines, if each could talk to >100 other processors, maybe the AI people could build a >= real-time human brain simulator! There are also enuf "embarrassingly parallel" problems that such machines will have a good market, tho not as wide a market as the real and small "Cray-like" machines. BTW: It would be wonderful if I'm wrong, and someone finds a way to effectively split *ANY* problem to run on >100 processors in parallel, but I won't believe it until I see it. John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 To err is human, to really foul up world news requires the net!