Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Multi-Processor Serializability Summary: Can the compiler do a good job? Keywords: data ordering, coherence, shared memory multiprocessing Message-ID: <1120@l.cc.purdue.edu> Date: 4 Feb 89 12:56:51 GMT References: <3492@cloud9.Stratus.COM> <19635@lll-winken.LLNL.GOV> <7650@polyslo.CalPoly.EDU> Organization: Purdue University Statistics Department Lines: 54 In article <7650@polyslo.CalPoly.EDU>, cquenel@polyslo.CalPoly.EDU (88 more school days) writes: ...................... > Wait a minute. What exactly is "ordering of instructions" ? > You talk about it as if there is one true "order" for instructions > generated by the compiler, and optimizations *change* that order. > > The compiler GENERATES the order. Probably no two compilers will > generate the SAME code sequence for a significantly large program. > So which one is right ? *Neither* obviously, right ? It depends. > > So, what you're trying to say is this : I want the compiler > to generate code within certain constraints. I want it to > generate code that is what I expect, so that I can second-guess DOES > the code and do things with my code that are directly supported > by the language. > > This is not a goal that I can frown on easily. Nor I. But the problem is more difficult than it seems. A "bug" was found in the CDC6600 architecture. Consider the following machine instructions, which I am writing in pseudocode, as I do not expect the readers to understand COMPASS. Xi denotes register X register i. X7 = X0/X1 memloc = X7 if(X3 < 0) goto LL ... LL memloc = X6 ... But the timing is such that if X3 < 0, then the "second" store is completed before the "first" store begins. The original architecture did not have a pending store block the issuance of a store instruction, and this was changed to prevent this from happening. Of course, this slows things down when this type of conflict does not occur. THIS one could be caught by a compiler. But there are others which can not. The programmer could tell if different symbolic locations are used, but not in all cases. Supposed the first memloc was array1[B3] and the second array2[B4]. It is much more difficult now. One could have hardware blocks, which requires the CPU to keep track of all pending load and store addresses. The upshot is that one can put severe general restrictions on the compiler and assembler, or one can try to get the programmer into the act at the machine level. I strongly suggest the latter. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)