Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!mailrus!uflorida!gatech!hubcap!encore!bzs
From: encore!bzs@EDDIE.MIT.EDU (Barry Shein)
Newsgroups: comp.parallel
Subject: Re: Superlinear(?) speedups
Message-ID: <3932@hubcap.UUCP>
Date: 19 Dec 88 13:18:30 GMT
Sender: fpst@hubcap.UUCP
Lines: 40
Approved: parallel@hubcap.clemson.edu

I think it's unfortunate to immediately toss out non-CPU effects, it
often tosses out the baby with the bathwater. A while back, using
parallel make on an Encore Multimax, I observed a 50X speedup over
doing exactly the same build on a Vax750. Sheer processor comparison
only predicted about a 15X speedup (ie. a 6 NS32332 Encore, about 12
MIPS, vs a 750, about .75MIPS.) I think you would find this sort of
result on any decent MIMD (both of them :-)

I suppose you can argue that this is somehow cheating (?) but watching
a compile finish in 2 minutes instead of 100 minutes impressed me
(this was before I worked at Encore and I really had to build the 100+
C module systems on both systems for practical reasons, when I saw the
time difference I ran it again on both systems several times to see if
it was just an accident, nope, repeatedly about 50X difference.) It is
an unfortunate theory that dismisses that extra 3X speedup, no? The
whole system is parallelized, not just the ALUs (see next pp as to why
I say ALUs and not CPUs.)

As to the breadth-first problem where one might simulate this on a
uni-processor again we run into a practical problem. If the search
runs depth-first on N processors then the only way you can simulate it
on a uni is by saving and restoring a fair amount of context or using
a processor and algorithm which, for example, keeps a lot of state in
registers on the uni, probably requiring dozens of registers being
available which a parallel system couldn't take advantage of in the
same context. On several CPUs one does have dozens of registers
available. Some of this has to be taken into account.

So, in short, although it sounds good I suspect that these
super-linear speedups will be impossible to shoot down on a similar
uniprocessor, in practice. I realize that's unsatisfying to those who
like to ignore reality but it seems such practicalities are what
people trying to do real computations are concerned with.

I'm always astounded at the energy people will put into "theoretical"
arguments that parallel processing doesn't work (or doesn't work very
well, or is too complicated to utilize) while people around them are
sitting there making it work just fine.

	-Barry Shein, ||Encore||