Path: utzoo!attcan!uunet!ncrlnk!ncrcae!hubcap!aglew From: aglew@mcdurb.Urbana.Gould.COM Newsgroups: comp.parallel Subject: Re: Superlinear(?) speedups Message-ID: <3955@hubcap.UUCP> Date: 21 Dec 88 14:07:21 GMT Sender: fpst@hubcap.UUCP Lines: 35 Approved: parallel@hubcap.clemson.edu >How many examples of super-linear speedup can be explained >by increases in memory-bandwidth due to more registers >and more cache rather than increases in CPU horsepower? > >David Callahan Probably many, if not most. But then, look - we know that the approximate shape of the curve that describes size of fast memory vs. performance, right? Ie. for small amounts of fast memory the ratio performance/memory is super-unitary; actually, this curve goes up quite a way if the fast memory speed remains constant. 'Course, at some point it turns back over, or else caches won't work. So, more processors => more fast memory, and if we're at the right point in the curve we get a super-unitary payoff. But, you say, why not put all that memory on a single processor? OK - but then the memory speed falls off, typically by a logarithmic factor (it's easier to put small amounts of fast memory on lots of processors than it is to put a large amount of memory on a single processor). Oh, you earlier argument about limiting context switch frequency has too limited a concept of "context". There is also the context, within a single job, like when you switch from processing matrix A to matrix B, and then back again. Such context switches are part of your algorithm, not artifacts of the multiprogramming environment you are working in. I have earlier posted a small conceptual "proof" that super-unitary speedup is possible in "context-rich" problems. In that "proof" it is possible to play around with the scaling factors to see under what circumstances super-unitary speedup is possible. For example, if fast memory speed stays constant for all sizes, super-unitary sppedup is much less likely than if fast memory speed scales logarithmically.