Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Multi-Processor Serializability Summary: Some things can be done easily, some not. Keywords: data ordering, coherence, shared memory multiprocessing Message-ID: <1118@l.cc.purdue.edu> Date: 2 Feb 89 12:06:12 GMT References: <3492@cloud9.Stratus.COM> Organization: Purdue University Statistics Department Lines: 87 In article <3492@cloud9.Stratus.COM>, tomc@cloud9.Stratus.COM (Tom Clark) writes: > In any computer system, the programmer expects operations in the source > code to be carried out in the order specified. The programmer expects the results to be as if the operations were carried out in the order specified. I see no reason why it is necessary that the actual computations be done in that order. There are architectures where that is not the case now. < This is especially true < in a multi-processor system where multiple processes may share data. < For example, process A may write two locations with x' and y'. Process < B, seeing y', assumes x' has been written and reads it. This may be < extended causally by process B then writing a third location z' and < process C, seeing z', knowing that x' must have been written. If for < some reason these writes are performed out of order, the model on which < the programmers have based their thinking is wrong. Programmers have < been used to such a model for quite a while. By the use of both < explicit and implicit locks, they have exploited it to arrive at < well-behaved high-performance MP systems. One way of blocking this is to require that a write to a location, including a register, put a block on that location. Also, a pending read from a location must put a block on a write to that location. For scalar operations, this can be handled in hardware at some cost. The only cheap way is to have a write block all reads and writes, and a pending read block all writes, but this will slow down code. > However, today compilers are optimizing and rearranging the order of the > operations specified in the program (especially for RISC). In addition, > newer high-performance processors will reorder operations within the > chip to improve performance by the use of data-driven techniques. Also, > newer computers have more complex busses (multiple paths) between the > CPUs and memories. The problem of cache coherence also adds complexity > to the problem. > > In the case of explicit locks, the programmer can give hints to the > compiler and the hardware in order to force a certain sequence of > operations. These hints may take the form of a 20 pound sledge at times > (e.g. forcing serial operation of the CPU or disabling compiler > optimization in some parts of code) but they will work. > > The problem comes with implicit locks. An implicit lock is the > dependence on the ordering of data references (both reads and writes). > These are often very hard to find by inspection, even if one has the > time to examine all parts of the source code. Have people thought of > how to get older software to work on newer machines and compilers? > Obviously older applications and operating systems would like to take > advantage of the new technology, but finding all these problems before > putting the system into production is very difficult. The class of bugs > introduced is such that testing cannot be depended upon to find them, > and they will likely occur when the system is very busy (cost of failure > is high). > > I'd like to hear any suggestions for dealing with this problem. Even if > you can handle the issue for your own code, how do you train a customer > to do it for their code? What techniques (hardware and software) can be > applied? Is the problem solvable? Here is a simple example of this problem, which requires knowledge of the machine to handle. It is an actual situation, not invented for this posting. A good method for generating pseudo-random numbers is to form x[n] = x[n-j] OP x[n-k], where there are several choices for OP (exclusive or is always one of them) depending on the machine. Now it is necessary that the values of x[n-j] and x[n-k], for sufficiently large n, are the values computed by the recur- rence. There only completely safe method I can think of is to make sure that the number of new items generated at one time is at most min(j,k). Whether more can be done depends on machine parameters. On the vector register machines in my ken, this is no problem, as the vector lengths are fairly small, although it can rule out small values of j and k. But on vector stream machines, if one knows, for example, that the an item M units back in the array must have been stored before reading, all that is needed is that j and k are at least M. Thus it takes interaction between the programmer and the compiler, and some machine dependence. It is inefficient not to allow the programmer to over- ride the safety features, but also dangerous. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)