Path: utzoo!attcan!uunet!ncrlnk!ncrcae!hubcap!aglew
From: aglew@mcdurb.Urbana.Gould.COM
Newsgroups: comp.parallel
Subject: Re: Superlinear(?) speedups
Message-ID: <3955@hubcap.UUCP>
Date: 21 Dec 88 14:07:21 GMT
Sender: fpst@hubcap.UUCP
Lines: 35
Approved: parallel@hubcap.clemson.edu

>How many examples of super-linear speedup can be explained
>by increases in memory-bandwidth due to more registers
>and more cache rather than increases in CPU horsepower?
>
>David Callahan

Probably many, if not most. But then, look - we know that 
the approximate shape of the curve that describes size
of fast memory vs. performance, right? Ie. for small amounts
of fast memory the ratio performance/memory is super-unitary;
actually, this curve goes up quite a way if the fast memory speed
remains constant. 'Course, at some point it turns back over, or else
caches won't work.

So, more processors => more fast memory, and if we're at the right
point in the curve we get a super-unitary payoff. But, you say,
why not put all that memory on a single processor? OK - but then
the memory speed falls off, typically by a logarithmic factor
(it's easier to put small amounts of fast memory on lots of processors
than it is to put a large amount of memory on a single processor).

Oh, you earlier argument about limiting context switch frequency
has too limited a concept of "context". There is also the context,
within a single job, like when you switch from processing matrix
A to matrix B, and then back again. Such context switches are part
of your algorithm, not artifacts of the multiprogramming environment
you are working in.

I have earlier posted a small conceptual "proof" that super-unitary
speedup is possible in "context-rich" problems. In that "proof" it
is possible to play around with the scaling factors to see under
what circumstances super-unitary speedup is possible. For example,
if fast memory speed stays constant for all sizes, super-unitary
sppedup is much less likely than if fast memory speed scales 
logarithmically.