Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!nrl-cmf!ames!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Multi-Processor Serializability
Keywords: data ordering, coherence, shared memory multiprocessing
Message-ID: <13250@winchester.mips.COM>
Date: 14 Feb 89 16:01:07 GMT
References: <3492@cloud9.Stratus.COM> <19635@lll-winken.LLNL.GOV> <3507@cloud9.Stratus.COM> <1170@houxs.ATT.COM> <58218@pyramid.pyramid.com>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 40

In article <58218@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>In article <3507...>, tomc@cloud9.Stratus.COM (Tom Clark) writes:
>> > In ANSI C the tool to use to enforce reference order for reads and writes
>> > is the keyword volatile, 
>> 
>> Unfortunately volatile does not do it.  Volatile does indeed force a write
>> to memory instead of holding an intermediate result in a register, but it
>> says nothing about ordering of instructions.  
>
>I must be wrong about this (cuz no one else has posted it yet)
>but are you SURE ANSI volatile ``says nothing about the ordering
>of instructions''?  Well, what does the part about ``the value
>of the volatile object shall agree with that prescribed by the
>abstract machine at all sequence points''?  Dang, I thought it
>meant that you couldn't move computations involving volatile
>object across sequence points, which to me means you have
>constraints on ordering.  As for operator application order, for
>the expression  a <op1> b <op2> c, the evaluation must proceed
>`as if' op1 were applied first, and then op2, etc.

When you do global optimization on the UNIX kernel, including device
drivers, "volatile" better work "right", regardless of what the standard
says or doesn't say.  Specifically, if you declare something volatile:
	a) Take the completely unoptimized version of the code, and consider
	the sequence of loads and stores from/to volatile variables.
	b) The optimized version of the code had better do exactly the same
	number, in the same sequence, of such loads and stores.
If it doesn't work exactly this way, life will be hellish for the kernel folks,
especially those writing device drivers.

BTW, doing this right does not necessarily mesh well with classical optimization
technology; it's very hard to get right and still have aggressive
optimization; bugs are very subtle.  [We used to do "binary search" on the
kernel, i.e., optimization half of it; if it still worked, optimize half of
the rest, etc, until the offending module was found.  This was not fun.]
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086