Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: Multi-Processor Serializability
Summary: Some things can be done easily, some not.
Keywords: data ordering, coherence, shared memory multiprocessing
Message-ID: <1118@l.cc.purdue.edu>
Date: 2 Feb 89 12:06:12 GMT
References: <3492@cloud9.Stratus.COM>
Organization: Purdue University Statistics Department
Lines: 87

In article <3492@cloud9.Stratus.COM>, tomc@cloud9.Stratus.COM (Tom Clark) writes:
> In  any computer system, the programmer expects operations in the source
> code to be carried out in the order specified.

The programmer expects the results to be as if the operations were carried
out in the order specified.  I see no reason why it is necessary that the
actual computations be done in that order.  There are architectures where
that is not the case now.

<		 This is especially  true
< in  a  multi-processor  system  where multiple processes may share data.
< For example, process A may write two locations with x' and  y'.  Process
< B,  seeing  y',  assumes  x' has been written and reads it.  This may be
< extended causally by process B then writing  a  third  location  z'  and
< process  C,  seeing  z', knowing that x' must have been written.  If for
< some reason these writes are performed out of order, the model on  which
< the  programmers  have  based their thinking is wrong.  Programmers have
< been used to such a model  for  quite  a  while.  By  the  use  of  both
< explicit  and  implicit  locks,  they  have  exploited  it  to arrive at
< well-behaved high-performance MP systems.

One way of blocking this is to require that a write to a location, including
a register, put a block on that location.  Also, a pending read from a 
location must put a block on a write to that location.  For scalar
operations, this can be handled in hardware at some cost.  The only
cheap way is to have a write block all reads and writes, and a pending
read block all writes, but this will slow down code.

> However, today compilers are optimizing and rearranging the order of the
> operations specified in the program (especially for RISC).  In addition,
> newer  high-performance  processors  will  reorder operations within the
> chip to improve performance by the use of data-driven techniques.  Also,
> newer computers have more complex busses (multiple  paths)  between  the
> CPUs  and memories.  The problem of cache coherence also adds complexity
> to the problem.
> 
> In  the  case  of  explicit  locks, the programmer can give hints to the
> compiler and the hardware in  order  to  force  a  certain  sequence  of
> operations.  These hints may take the form of a 20 pound sledge at times
> (e.g.  forcing  serial  operation  of  the  CPU  or  disabling  compiler
> optimization in some parts of code) but they will work.
> 
> The  problem  comes  with  implicit  locks.  An  implicit  lock  is  the
> dependence on the ordering of data references (both reads  and  writes).
> These  are  often  very  hard to find by inspection, even if one has the
> time to examine all parts of the source code.  Have  people  thought  of
> how  to  get  older  software  to  work on newer machines and compilers?
> Obviously older applications and operating systems would  like  to  take
> advantage  of  the new technology, but finding all these problems before
> putting the system into production is very difficult.  The class of bugs
> introduced is such that testing cannot be depended upon  to  find  them,
> and they will likely occur when the system is very busy (cost of failure
> is high).
> 
> I'd like to hear any suggestions for dealing with this problem.  Even if
> you  can handle the issue for your own code, how do you train a customer
> to do it for their code?  What techniques (hardware and software) can be
> applied?  Is the problem solvable?

Here is a simple example of this problem, which requires knowledge of the
machine to handle.  It is an actual situation, not invented for this 
posting.

A good method for generating pseudo-random numbers is to form

	x[n] = x[n-j] OP x[n-k],

where there are several choices for OP (exclusive or is always one of them)
depending on the machine.  Now it is necessary that the values of x[n-j]
and x[n-k], for sufficiently large n, are the values computed by the recur-
rence.  There only completely safe method I can think of is to make sure
that the number of new items generated at one time is at most min(j,k). 
Whether more can be done depends on machine parameters.

On the vector register machines in my ken, this is no problem, as the vector
lengths are fairly small, although it can rule out small values of j and k.
But on vector stream machines, if one knows, for example, that the an item M
units back in the array must have been stored before reading, all that is
needed is that j and k are at least M.

Thus it takes interaction between the programmer and the compiler, and some
machine dependence.  It is inefficient not to allow the programmer to over-
ride the safety features, but also dangerous.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)