Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!psuvax1!psuvm.bitnet!cunyvm!nyser!cmx!snacpac.npac.syr.edu!billo From: billo@snacpac.npac.syr.edu (Bill O) Newsgroups: comp.arch Subject: Re: Multi-Processor Serializability Keywords: data ordering, coherence, shared memory multiprocessing Message-ID: <1066@cmx.npac.syr.edu> Date: 1 Feb 89 04:23:18 GMT References: <3492@cloud9.Stratus.COM> Sender: usenet@cmx.npac.syr.edu Reply-To: billo@snacpac.npac.syr.edu.UUCP (Bill O'Farrell) Organization: Northeast Parallel Architectures Center Lines: 113 In article <3492@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes: >In any computer system, the programmer expects operations in the source >code to be carried out in the order specified. > ... >However, today compilers are optimizing and rearranging the order of the >operations specified in the program (especially for RISC). In addition, >newer high-performance processors will reorder operations within the >chip to improve performance by the use of data-driven techniques. Also, >newer computers have more complex busses (multiple paths) between the >CPUs and memories. The problem of cache coherence also adds complexity >to the problem. But it is a problem that has been solved. There exist many protocols for efficient coherent caches that correctly implement atomic lock operations. > ... >The problem comes with implicit locks. An implicit lock is the >dependence on the ordering of data references (both reads and writes). >These are often very hard to find by inspection, even if one has the >time to examine all parts of the source code. Have people thought of >how to get older software to work on newer machines and compilers? >Obviously older applications and operating systems would like to take >advantage of the new technology, No optimizing or parallelizing compiler worth its salt will reorder statements (either through parallelization or global optimization) if such reordering would change the meaning of the program. The technology of optimizing compilers is about 30 years old, and is very robust. The technology of automatically parallelizing compilers is, perhaps, 10 years old, but there are many examples of success. A couple that we know about at NPAC are the parallelizing Fortran compilers for the Alliant FX/80 and the Encore Multimax. Compilers of this sort examine programs for loops that can be run in parallel on separate processors, and insert synchronization points for any data-dependencies that are found. The Alliant compiler also examines loops for vectorizability, and will usually produce code that runs "concurrent outer, vector inner", which means that an outer loop is having its iterations performed in parallel, while the inner loop has been "unwound" into vector operations. Both the Alliant and Encore compiler also perform "good old" global optimization, and both never NEVER perform an optimization if it would change the meaning of a program. > >I'd like to hear any suggestions for dealing with this problem. Even if >you can handle the issue for your own code, how do you train a customer >to do it for their code? I have a good deal of experience with the Alliant compiler, so I'll talk about it. The compiler, by its very nature, is conservative. It does not perform an optimization or parallelization if it thinks there's any chance that it could change the order of interdependent computations. Of course, it sometimes is too conservative, and will fail to optimize where it really could have. In these cases it prints an "informational message" which says what it thinks is the problem. The programmer can then opt, if he/she feels the compiler was been too conservative, to include a compiler directive in the code telling the compiler to go ahea and optimize anyway. These informational messages are tremendously helpful to our users. Being primarily a Fortran engine, the FX/80 is used principally by "real" scientific Fortran programmers -- not computer scientists, yet we have had real success in training our users how to interpret the messages, and when to try compiler directives >What techniques (hardware and software) can be >applied? Well, automatically parallelizing compilers, as mentioned, are a viable option. Perhaps the most aggressive compiler of this sort is the Fortran compiler for the Multiflow Trace machines. The Trace compilers move code even when it *will* affect meaning -- specifically, assumptions are made about the result of branch tests before the test is performed. These assumptions allow more parallelism to be exploited, and the compiler can "get away with it" because it inserts extra instructions to "undo the damage" in cases when the branch went a different way then predicted. Just as with the Alliant compiler, and with any good optimizing compiler, the overall semantics of the program are not changed. This is not handwaving. All of the techniques used by such compilers are provably correct. (Naturally, any compiler may have bugs, and an optimizing compilers is no exception, but that is a problem of software engineering). I should point out that Alliant is developing an optimizing/vectorizing/parallelizing C compiler too, so the techniques aren't limited to Fortran. It just so happens that Fortran is where the demand is, so that is why so much effort has been directed at the development of Fortran compilers. As for RISC, perhaps Tom is thinking about RISC-specific techniques such as inserting a branch sooner in the code than one would for a non-RISC machine. But such branch instructions are always of the deferred branch sort which guarantee that the semantics of the program will be preserved. Of course, RISC compilers also may perform global optimization, but such techniques are well known and understood. Finally, concerning explicit locks, an optimizing compiler will never move an explicit lock operation, either because the locking operation is built-into the language, and so it knows better, or because the locking operation is performed by an external subroutine, which the compiler will likely make worst-case assumptions about, thus preserving language semantics. Is the problem solvable? YES, but maybe I've just been rambling, and haven't answered your question > > - Tom Clark > - Stratus Computer, Inc., Marlboro, MA > - Disclaimer: My opinions, nobody elses. Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University (billo@cmx.npac.syr.edu) Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University (billo@cmx.npac.syr.edu) #! rnews 404 Relay-Version: V