Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!psuvax1!psuvm.bitnet!cunyvm!nyser!cmx!snacpac.npac.syr.edu!billo
From: billo@snacpac.npac.syr.edu (Bill O)
Newsgroups: comp.arch
Subject: Re: Multi-Processor Serializability
Keywords: data ordering, coherence, shared memory multiprocessing
Message-ID: <1066@cmx.npac.syr.edu>
Date: 1 Feb 89 04:23:18 GMT
References: <3492@cloud9.Stratus.COM>
Sender: usenet@cmx.npac.syr.edu
Reply-To: billo@snacpac.npac.syr.edu.UUCP (Bill O'Farrell)
Organization: Northeast Parallel Architectures Center
Lines: 113

In article <3492@cloud9.Stratus.COM> tomc@cloud9.Stratus.COM (Tom Clark) writes:
>In  any computer system, the programmer expects operations in the source
>code to be carried out in the order specified.
> ...
>However, today compilers are optimizing and rearranging the order of the
>operations specified in the program (especially for RISC).  In addition,
>newer  high-performance  processors  will  reorder operations within the
>chip to improve performance by the use of data-driven techniques.  Also,
>newer computers have more complex busses (multiple  paths)  between  the
>CPUs  and memories.  The problem of cache coherence also adds complexity
>to the problem.

But it is a problem that has been solved. There exist many protocols for
efficient coherent caches that correctly implement atomic lock operations.

> ...
>The  problem  comes  with  implicit  locks.  An  implicit  lock  is  the
>dependence on the ordering of data references (both reads  and  writes).
>These  are  often  very  hard to find by inspection, even if one has the
>time to examine all parts of the source code.  Have  people  thought  of
>how  to  get  older  software  to  work on newer machines and compilers?
>Obviously older applications and operating systems would  like  to  take
>advantage  of  the new technology,

No optimizing or parallelizing compiler worth its salt will reorder statements
(either through parallelization or global optimization) if such reordering
would change the meaning of the program. The technology of optimizing compilers
is about 30 years old, and is very robust. The technology of automatically
parallelizing compilers is, perhaps, 10 years old, but there are many examples
of success. A couple that we know about at NPAC are the parallelizing
Fortran compilers for the Alliant FX/80 and the Encore Multimax. Compilers
of this sort examine programs for loops that can be run in parallel on
separate processors, and insert synchronization points for any data-dependencies
that are found. The Alliant compiler also examines loops for vectorizability,
and will usually produce code that runs "concurrent outer, vector inner", which
means that an outer loop is having its iterations performed in parallel, while
the inner loop has been "unwound"  into vector operations. Both the Alliant
and Encore compiler also perform "good old" global optimization, and both
never NEVER perform an optimization if it would change the meaning of a program.

>
>I'd like to hear any suggestions for dealing with this problem.  Even if
>you  can handle the issue for your own code, how do you train a customer
>to do it for their code?

I have a good deal of experience with the Alliant compiler, so I'll
talk about it. The compiler, by its very nature, is conservative. It
does not perform an optimization or parallelization if it thinks
there's any chance that it could change the order of interdependent
computations.  Of course, it sometimes is too conservative, and will
fail to optimize where it really could have. In these cases it prints
an "informational message" which says what it thinks is the problem.
The programmer can then opt, if he/she feels the compiler was been too
conservative, to include a compiler directive in the code telling the
compiler to go ahea and optimize anyway. These informational messages
are tremendously helpful to our users. Being primarily a Fortran
engine, the FX/80 is used principally by "real" scientific Fortran
programmers -- not computer scientists, yet we have had real success
in training our users how to interpret the messages, and when to try
compiler directives


>What techniques (hardware and software) can be
>applied?

Well, automatically parallelizing compilers, as mentioned, are a
viable option. Perhaps the most aggressive compiler of this sort is
the Fortran compiler for the Multiflow Trace machines. The Trace
compilers move code even when it *will* affect meaning --
specifically, assumptions are made about the result of branch tests
before the test is performed. These assumptions allow more parallelism
to be exploited, and the compiler can "get away with it" because it
inserts extra instructions to "undo the damage" in cases when the
branch went a different way then predicted.  Just as with the Alliant
compiler, and with any good optimizing compiler, the overall semantics
of the program are not changed.  This is not handwaving. All of the
techniques used by such compilers are provably correct. (Naturally,
any compiler may have bugs, and an optimizing compilers is no exception,
but that is a problem of software engineering).

I should point out that Alliant is developing an
optimizing/vectorizing/parallelizing C compiler too, so the techniques
aren't limited to Fortran. It just so happens that Fortran is where
the demand is, so that is why so much effort has been directed at the
development of Fortran compilers.

As for RISC, perhaps Tom is thinking about RISC-specific techniques such
as inserting a branch sooner in the code than one would for a non-RISC
machine. But such branch instructions are always of the deferred branch
sort which guarantee that the semantics of the program will be preserved.
Of course, RISC compilers also may perform global optimization, but such
techniques are well known and understood.

Finally, concerning explicit locks, an optimizing compiler will never
move an explicit lock operation, either because the locking operation
is built-into the language, and so it knows better, or because the
locking operation is performed by an external subroutine, which the
compiler will likely make worst-case assumptions about, thus preserving
language semantics.

Is the problem solvable?

YES, but maybe I've just been rambling, and haven't answered your question

>
>        - Tom Clark
>        - Stratus Computer, Inc., Marlboro, MA
>        - Disclaimer: My opinions, nobody elses.

Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University
(billo@cmx.npac.syr.edu)
Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University
(billo@cmx.npac.syr.edu)
#! rnews            404
Relay-Version: V