Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!iuvax!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew From: aglew@dual.csg.uiuc.edu (Andy Glew) Newsgroups: comp.arch Subject: Re: Synchronization primitives and cache coherence Message-ID: Date: 13 Jun 90 23:38:53 GMT References: <1360003@aspen.IAG.HP.COM> Sender: usenet@ux1.cso.uiuc.edu (News) Organization: University of Illinois, Computer Systems Group Lines: 276 In-Reply-To: aglew@lasso.csg.uiuc.edu's message of 13 Jun 90 16:38:15 (Second try at posting - I hate silent errors) I've been working in this area, and have written a survey paper which is on hold pending finishing my thesis. I'm currently doing work on this, with a work in progress paper coming out in the midsummer Computer Architecture News issue, entitled "Snoopy Cache Test-and-test-and-set Without Excessive Bus Contention". The survey paper is 77K of troff. I could split that and email it to you if you want (or do the same with Postscript) or I could physical mail you the paper. Your post prompted me to posting the comparative tables of this paper to comp.arch. I'll email them to you as well. The comparative tables are 132 columns wide. The terms used in them are explained in the paper itself. 1.1. Table 1: Processors and Synchronization - - ----- - ---------- --- --------------- -------------------------------------------------------------------------------------------------------------------------------- | Table 1: Processors and Synchronization | | Processor Architecture | Implementation | | | Instruction | Type | Cache | External| Comments | | | | | | Signal | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |MIPS R2000 | none | | | | see Table 2 | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |NS32xxx | CBIT/SBIT | test-and-set-arbitrary-bit | bypass | ILO | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |MC68030 | TAS | load-store-fixed-bit | | | | | | -------------------|-----------------------------| | | ----------------------| | | CAS | compare-and-swap | bypass | RMC | Not always | | | CAS2 | " " queues | | | implemented | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |MC88x00 | XMEM | exchange-with-memory | bypass | LOCK | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |WE32100 | | exchange-with-memory | | | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Pyramid | BITSW | test-and-set-arbitrary-bits| | | | | | -------------------|-----------------------------| bypass | | | | | ICMPW | fetch-and-add | ------ | | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Clipper C300| TSTS | test-and-set-arbitrary-bit | bypass | LOCK | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Intel 80286 | LOCK prefix | arbitrary operations | - | LOCK# | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Intel i486 | LOCK prefix | short RMW instructions | | LOCK# | | | | -------------------|-----------------------------| | | | | | XCHG | exchange-with-memory | bypass cache | | | | | -------------------|-----------------------------| flush buffers | | | | | XADD | fetch-and-add | ----- ------- | | | | | -------------------|-----------------------------| | | | | | CMPXCHG | compare-and-swap | checks in cache first | | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Intel i860 | lock | locks bus and | bypasses unsnooped | LOCK# | user | | | arbitrary code | blocks interrupts | chip cache | | | | | unlock | for 32 cycles | | | | | | -------------------|-----------------------------| | | ----------------------| | | LK bit | arbitrary sequences | | | priviliged | | | | | | | no time limit | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |AMD29000 | LOADSET | load-store-fixed | | LOCK | | | | -------------------|-----------------------------| | | ----------------------| | | LOADL, STOREL | not RMW | | | For use by systems | | | | | | | with special support?| | | -------------------|-----------------------------| | | ----------------------| | | LK bit in register| arbitrary sequences | | | priviliged | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |SPUR | TEST AND SET | load-store-fixed | RMW in cache | | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |SPARC | LDSTUB | load-store-fixed | bypass | LDST | | | | -------------------|-----------------------------| | | | | | SWAP | exchange-with-memory | | | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |Gould NP1 | TSM, ZSM, SSM | test-and-set-arbitrary-bits| bypass cache | | | | | | | flush local | | | | | | | and ISBL buffers | | | | | | | changes priority | | | | | | | on access to local cache| | | |------------|--------------------|-----------------------------|---------------------------|----------|-----------------------| |IBM RT | TSH | load-store-fixed | | | | |------------------------------------------------------------------------------------------------------------------------------| |IBM America 801 Storage - lock bits on cache lines, implicitly locked | |------------------------------------------------------------------------------------------------------------------------------| | HEP Full/empty bits on words in memory | -------------------------------------------------------------------------------------------------------------------------------- 1.2. Table 2: Systems and Synchronization - - ----- - ------- --- --------------- --------------------------------------------------------------------------------------------------------------------------------- | Table 2: Systems and Synchronization | | System | Processor | Cache | Synchronization| Type | Features | Privilige | | | | | Support | | | | |-----------------------------------------------------------------|-------------------------------------------------------------| | | | | SLIC gates | test-and-set | special distributed | kernel | | | | | | | synchronization registers.| | | | | | | | | | | | | | | | special bus for | | | | | | | | synchronization | | |Sequent | | write-through | | | | | | | | | flush buffers | | |Balance | NS32000 | invalidate | | | | | | | | | on release | | | | | | ----------------|------------------+----------------------------+-------------| | | | | ALM | load-store-fixed| special memory | user | | | | | | | normal bus | | | | | | | | mapped | | | | | | ----------------|-------------------------------------------------------------| | | | | xchg | exchange- | normal memory | user | | | | | | with-memory | normal bus | | |-----------------------------------------------------------------|-------------------------------------------------------------| |Sequent | 386? | copy-back | parallel locks | | | kernel | |Symmetry | | | | | | user | |-------------+------------+--------------------+-----------------|------------------+----------------------------+-------------| |SGI | MIPS R2000| copy-back | | test-and-set | special bus | kernel | |Powerseries | | | | | special memory | user | | | | | | | mapped | | |-----------------------------------------------------------------|-------------------------------------------------------------| |Alliant FX | custom | shared cache | CCU, CCB | fetch-and-add | special distributed | user (gang)| | | (68000 | | | | synchronization registers | | | | inst. set)| | | | | | | | | | | | special bus | | | | | | | | | | | | | | | | bus based combining | | |-------------+------------+--------------------+-----------------|------------------+----------------------------+-------------| |Gould NP1 | custom | write-through | semaphore | test-and-set- | flush local and | user | | | | invalidate | instructions | arbitrary-bit | remote buffers | | | | | | TSM, ZSM, SSM | | changes priority | | | | | | | | on cache access | | | | | | ----------------|-------------------------------------------------------------| | | | | queue ops | complex | | user | |-------------+------------|--------------------+-----------------+------------------+----------------------------+-------------- |Encore | NS32x32 | copy-back | | load-store-fixed| special bus | user | |Multimax | | | | | transaction | | | | | | | | normal memory | | | | | | | | changed processor | | | | | | | | semantics | | |-------------|------------|--------------------|-----------------|------------------|----------------------------|-------------| | | | trap, interrupts | | | | | |DECSTATION | | | | | | | | | | blocked in kernel| | | | | |3100 (MACH) | MIPS | | test-and-set | uniprocessor | user | | | | | -------------------| | | | | | | | Lamport | | | | | |-------------|------------|--------------------|-----------------|------------------|----------------------------|-------------| |NYU | | write-back | | fetch-and-add | non-bus based | user | |Ultracomputer| | | | | combining in | | | | | | | | interconnect | | --------------------------------------------------------------------------------------------------------------------------------- 1.3. Table 3: Cache Interfaces and Synchronization - - ----- - ----- ---------- --- --------------- ------------------------------------------------------ | Table 3: Cache Interfaces and Synchronization | | Interface | Actions | Comments | |-----------------------------------------------------+ SPUR cache | TEST AND SET | Modified | - - | | controller | performed in | NuBus. | | | cache | | -----------------+-----------------+------------------ | Austek A38152 | bypasses | 80x86 cache | | Microcache | cache invali- | controller | | | dates cached | | | | copies of | | | | lock | | |-----------------------------------------------------+ | Encore NanoBus | Converts | | | | NS32xxx set | | | | bit inter- | | | | locked into | | | | load-store- | | | | fixed | | ------------------------------------------------------ 1.4. Table 4: Busses and Synchronization - - ----- - ------ --- --------------- ---------------------------------------------------------------------------------- | Table 4: Busses and Synchronization | | Bus | Synchronization Support | Comments | +--------------------------------------------------------------------------------+ | EISA | LOCK signal | | ----------------+--------------------------------------------+-------------------- | Encore | special separate | Memory cards | | nanoBus | address/data return | perform | | | operation to acquire | load-store- | | | lock | fixed | | | | | | | | | | | inter- | | | | locked opera- | | | | tions can | | | | overlap | +--------------------------------------------------------------------------------+ | NuBus | non-preemptive re- | Based on bus | | | arbitration | scheduling | ----------------+--------------------------------------------+-------------------- | VSB | LOCK* signal | INTRASEQs | | ------------------+ | | | Higher level | | | | protocols re- | | | | quired for | | | | INTERSEQs | ----------------+--------------------------------------------+-------------------- | Futurebus | locked transactions | Busy unlocking | | | | multiple slaves | +--------------------------------------------------------------------------------+ | Futurebus+ | locked transactions | in-order | | -------------------------------------------+-------------------- | | Lock operations: | out-of-order | | | exchange | | | | fetch and add | | | | compare and swap | | +--------------------------------------------------------------------------------+ | VME | non-preemptive | bus lock | | -------------------------------------------+-------------------- | | continuous address strobe | interpreted | | | | as a resource | | | | lock in some | | | | systems | | ---------------------------------------------------------------+ | | non-standard LOCK signal on reserved pin | resource lock | ---------------------------------------------------------------------------------- 1.5. Table 5: Papers - - ----- - ------ ---------------------------------------------------------------------------------------- | Table 5: Papers | | Authors | Synchronization Technique | Comments | +--------------------------------------------------------------------------------------+ | VMP | software snoop actions | implemented | | | lock release bus signal | | ---------------------+------------------------------------+----------------------------- | Rudolph and Segall | test-and-test-and-set | ideal | | test-and-test-and-set coded - with race | +--------------------------------------------------------------------------------------+ Bitar and Despain | lock states in cache protocol | ---------------------+------------------------------------+----------------------------- | Eggers two write brodcasts after Rudolph and Segall | | then write invalidate | +--------------------------------------------------------------------------------------+ Goodman et al | queues through empty cache lines | multiple busses ---------------------+------------------------------------+----------------------------- | Anderson burst on lock release software solutions | | backoff | +--------------------------------------------------------------------------------------+ Dubois and Briggs | LOADSYNC, CONDSTORE | use snooper ---------------------+------------------------------------+----------------------------- | Glew and Hwu test-and-test-and-set bursts eliminated | | | with bus abandonment | | ---------------------------------------------------------------------------------------- -- Andy Glew, aglew@uiuc.edu