Path: utzoo!mnetor!uunet!husc6!necntc!ames!amelia!orville.nas.nasa.gov!fouts From: fouts@orville.nas.nasa.gov (Marty Fouts) Newsgroups: comp.arch Subject: Re: Single tasking the wave of the future? Message-ID: <53@amelia.nas.nasa.gov> Date: 23 Dec 87 19:12:41 GMT References: <18@amelia.nas.nasa.gov> <2341@encore.UUCP> <25@amelia.nas.nasa.gov> <1030@alliant.Alliant.COM> Sender: news@amelia.nas.nasa.gov Reply-To: fouts@orville.nas.nasa.gov (Marty Fouts) Organization: NASA Ames Research Center, Moffett Field, CA Lines: 79 Keywords: parallel processing today In article <1030@alliant.Alliant.COM> muller@alliant.UUCP (Jim Muller) writes: >In <25@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes, >>A problem with these more sophisticated mechanisms is that they can >>lead to parallel execution in which the wall clock time goes up as a >>function of the number of processors, rather than down... > With vectorization this is typically due to excessive masking. In heavily > IF'ed code, the processor(s) can end up doing much more work than necessary. > The vector overhead generally is insignificant by comparison. True parallel > processing, however, does not suffer this problem, since any one processor > never has to execute the "not-to-be-executed" portions. For this reason, > true parallel processing can reduce runtimes even on code for which vector > processing is a liability. Although it is true that parallelization doesn't suffer from vector start up overhead, it does suffer from various degrees of synchronization overhead and communication cost. There are pathologic algorithms which will actually perform worse on a multiprocessor in concurrent mode as a result of that overhead than they perform in single processor mode on the same machine. > Race conditions are typically a problem only when a compiler has been > inappropriately "permitted" to optimize code that it normally would have > refused. The obvious approach, of course, should be to debug in single- > processor mode. > This of course assumes a compiler which can recognize the concurrency and generate concurrent code. While Alliant has demonstrated that there are a wide range of constructs for which concurrency can be detected, there is still a much wider range of constructs for which it has to be explicitly generated (and a wide range of vendors whose compilers do not deal with it.) For this class of problem, it is very difficult to generate correct code and even more difficult to debug code. > Light (or electromagnetic pulses) can only travel so fast, so you have to > make things smaller to make them faster (obviously). This is not obvious, only typical. You can also make them fast by making them more efficent. (;-) Anyway, my problem with parallelism as a way to speed things up is that it speeds up the machines ability to handle a workload, which doesn't help me because management will just put more users on it, but doesn't speed up its ability to handle my intractable problems. > > My background is science, not computer architecture, so I do not fully > appreciate the "decades of parallel processing research" or the "past > twenty years of various kinds of software/hardware". But I am quite > familiar with some approaches, and have seen that it does work. The > Alliant FX/8 was based on Kuck (1976), and in fact, was designed to > address many of the objections you raised about parallel processing's > limitations. It certainly cannot be described as equivalent to anything > from 20 years ago. . . To quible: your reference is 12 years old, and is based on work done several years earlier. I will split the decade with you and say 15 years instead. And, of course, Kuck's work is based on earlier theory. I agree that commercially available multiprocessors have only become accessable in the last 5 years, and that the technology that Kuck described isn't widely available yet. I still maintain that if you stray from the path >This is not an official Alliant response. I am simply an employee who feels >differently than you do. I only mentioned the FX/8 because it is a good, >real-world, functioning example that attempts to address the valid objections >you raised, and because I understand it. Alliant had no input into this >posting, nor did Alliant require me to write this disclaimer. I have written code for the Alliant, and it does a good job of recognizing concurrency that it can find. In the interest of a balanced point of view, I am trying to point out that there is a long way to go before this approach will be widely usable to speed up delivered performance to general purpose work loads. By all means, people with vectorizable code should use vector computers, people with concurent code should use multiple processors, and multiple processors should be used for competitive multitasking. But we should not look to these technologies to continue the once rapid growth in overall computer performance.