Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!mailrus!uflorida!gatech!hubcap!encore!bzs From: encore!bzs@EDDIE.MIT.EDU (Barry Shein) Newsgroups: comp.parallel Subject: Re: Superlinear(?) speedups Message-ID: <3932@hubcap.UUCP> Date: 19 Dec 88 13:18:30 GMT Sender: fpst@hubcap.UUCP Lines: 40 Approved: parallel@hubcap.clemson.edu I think it's unfortunate to immediately toss out non-CPU effects, it often tosses out the baby with the bathwater. A while back, using parallel make on an Encore Multimax, I observed a 50X speedup over doing exactly the same build on a Vax750. Sheer processor comparison only predicted about a 15X speedup (ie. a 6 NS32332 Encore, about 12 MIPS, vs a 750, about .75MIPS.) I think you would find this sort of result on any decent MIMD (both of them :-) I suppose you can argue that this is somehow cheating (?) but watching a compile finish in 2 minutes instead of 100 minutes impressed me (this was before I worked at Encore and I really had to build the 100+ C module systems on both systems for practical reasons, when I saw the time difference I ran it again on both systems several times to see if it was just an accident, nope, repeatedly about 50X difference.) It is an unfortunate theory that dismisses that extra 3X speedup, no? The whole system is parallelized, not just the ALUs (see next pp as to why I say ALUs and not CPUs.) As to the breadth-first problem where one might simulate this on a uni-processor again we run into a practical problem. If the search runs depth-first on N processors then the only way you can simulate it on a uni is by saving and restoring a fair amount of context or using a processor and algorithm which, for example, keeps a lot of state in registers on the uni, probably requiring dozens of registers being available which a parallel system couldn't take advantage of in the same context. On several CPUs one does have dozens of registers available. Some of this has to be taken into account. So, in short, although it sounds good I suspect that these super-linear speedups will be impossible to shoot down on a similar uniprocessor, in practice. I realize that's unsatisfying to those who like to ignore reality but it seems such practicalities are what people trying to do real computations are concerned with. I'm always astounded at the energy people will put into "theoretical" arguments that parallel processing doesn't work (or doesn't work very well, or is too complicated to utilize) while people around them are sitting there making it work just fine. -Barry Shein, ||Encore||