Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!uwvax!astroatc!johnw
From: johnw@astroatc.UUCP (John F. Wardale)
Newsgroups: comp.arch
Subject: Re: chewing up mips with graphics [parallel computing]
Message-ID: <338@astroatc.UUCP>
Date: Thu, 25-Jun-87 17:38:30 EDT
Article-I.D.: astroatc.338
Posted: Thu Jun 25 17:38:30 1987
Date-Received: Sat, 27-Jun-87 04:00:09 EDT
References: <8270@amdahl.amdahl.com> <359@rocky2.UUCP>
Reply-To: johnw@astroatc.UUCP (John F. Wardale)
Organization: Astronautics Technology Cntr, Madison, WI
Lines: 77
Keywords: parrallel computing; performacne; vector computing

In article <5793@think.UUCP> bradley@godot.think.com.UUCP (Bradley Kuszmaul) writes:
>In article <337@astroatc.UUCP> I (John F. Wardale) wrote:
>>...
>Actually, I interpret the 90%-10% rule as indicating that there might be
>a lot of parallelism in problems.  Since 90% of the time is spent
>executing a very small amount of code, it seems likely that there is a
>lot of data involved.  It is also likely that the dependencies between
>data are something smaller than "completely connected", and so it is
>likely that different parts of the data can be processed in parallel.
Comments:
1) The 90-10 was not well explain
2) the lots of data I agree with
3) the lack of dependencies does *NOT* follow!

re: 1:  If you machine parallelize, you get your gains from a
small section of code.  If you hand parallelize, you have to KNOW
which 10% to do.
Look at the theoretic (i.e. guarenteed not to exceed) performances
of vector machines (like a Cray) and the actual performance on
real code.  (Try the Linpak benchmark if you believe in
benchmarks.)  While is *IS* true that there *ARE* problems that
get 50+% of top, most tend to get *MUCH* less.  Why?  Because
scalar performance frequently dominates.

re: 3:  When scheduling code for pipelined computers, given real
programs several studies (sorry, no ref's) have shown that the
number of registers needed for complete register allocation was in
the twenties.  (i.e. 16 is pretty good,  32 is over-kill).
--> pipelining is parallelism, (tho admittedly limited) so why
should we believe that multi-cpus can extract more parallelism
than this?

Codes blocks can be classes as:
1: parallelizable and vectorizable [example: zero an array]
2: parallelizable    [example: a=5; b=6]
3: vectorizable      
4: neither
An example of #4 is an orbital dif-eq-shooting problem
you have initial conditions, and a set of differential equations.
In a loop you;
    increment time a little bit, calculate [sequentially dependent]
    values for acceleration, velocity, and position
end-loop

If anyone can vectorize or parallelize this, call me, I'll split
the profits with you.  :-)

>  The lesson is "Rules of thumb, such as Amdahl's Law, don't 
> have much to say about parallel computing".
Personally, I think such rules WILL apply to parallel
computations, and also be used to crudly estimate how much
parallelism be extracted from average codes.  Personally
I think it will be a one or low-two digit number.

As far as >1000 processor machines, if each could talk to >100
other processors, maybe the AI people could build a >= real-time
human brain simulator!

There are also enuf "embarrassingly parallel" problems that such
machines will have a good market, tho not as wide a market as
the real and small "Cray-like" machines.

BTW:  It would be wonderful if I'm wrong, and someone finds a way
to effectively split *ANY* problem to run on >100 processors in
parallel, but I won't believe it until I see it.


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!