Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!bu.edu!encore!encore.encore.COM!alan From: alan@encore.encore.COM (Alan Langerman) Newsgroups: comp.sys.encore Subject: Re: Sequential Consistency? Message-ID: <15362@encore.Encore.COM> Date: 20 Jun 91 17:05:11 GMT References: <1991Jun4.210114.11355@crl.dec.com> Sender: news@Encore.COM Reply-To: alan@encore.com Organization: Encore Computer Corp. Lines: 64 Nntp-Posting-Host: encore.encore.com In article <1991Jun4.210114.11355@crl.dec.com>, herlihy@crl.dec.com (Maurice Herlihy) writes: |> Is the Multimax memory sequentially consistent? (I.e., do reads and writes |> to memory occur in the order requested?) In the course of tracking down |> strange behavior in an experimental mutual exclusion protocol, I wrote the |> following simple memory exercis|> er (code at the end of the message). The Multimax obeys weak sequential consistency. In a strongly consistent system, all reads and writes from a given processor would immediately invalidate/update all caches in the system. However, there is a non-trivial performance penalty to be paid for this consistency. In a weakly consistent system, we can take full advantage of a split- transaction, pended bus; that means that operations are never synchronous across the bus, so the bus is never tied up waiting for a memory or cache to respond. In a tightly-coupled, shared-memory multiprocessor built around a central system bus, bus bandwidth becomes a very precious resource indeed, so there is a significant advantage to using a split-transaction, pended bus. However, the implication is that multiprocessor read/write traffic can become interleaved and results returned in an unexpected order. In the Multimax, all single-stream operations are strongly consistent. That is, the sequence of reads and writes generated by a single processor will always be seen by the processor itself in the original order. For instance, if a write operation is pending in a buffer, a subsequent read operation by the same processor will hit in the write buffer, rather than obtaining a stale value from cache or memory. However, the story becomes more complex when processors interact. Invalidate traffic queues up through a FIFO for each cache. These invalidates are normally processed asynchronously to the stream of requests flowing from processor to cache. Thus, the folowing scenario becomes possible: 1. Processor A writes memory location X. 2. Invalidate for location X queued in Processor B's FIFO. 3. Processor B issues read request for location X. 4. Processor B hits in cache and uses stale data. 5. Processor B's cache finally processes invalidate request. Too late. Of course, the story does not end here. A lock instruction (sbitib) is detected by the cache hardware, causing special processing. In brief, this processing guarantees that the particular processor/cache's view of the world will be consistent. All pending write operations are flushed back to memory and all pending invalidates are handled by the cache before the processor is allowed to continue. (The processor may then acquire the lock, or fail to acquire it; that's irrelevant.) By using lock instructions to guard access to shared data structures, processors guarantee correct software synchronization. The hardware detects lock instructions to guarantee strong sequential consistency around synchronization points. So, in sum: the Multimax implements weak sequential consistency: strong strong sequential consistency is guaranteed around synchronization points but not elsewhere. Alan P.S. If memory serves me, the "cinv" instruction exists only on NS32532 processors (and possibly follow-ons), and is privileged. If you use lock instructions, you will not need to flush the cache.