Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!apple!amdcad!dvorak.amd.com!nucleus!peter From: peter@nucleus.amd.com (Peter Song) Newsgroups: comp.arch Subject: Re: Living With Old Baggage Summary: CAS can be efficiently supported in RISCs Keywords: 68040 CAS2 Message-ID: <1991Apr23.184737.15377@dvorak.amd.com> Date: 23 Apr 91 18:47:37 GMT References: <336@scorpio.gtephx.UUCP> <12741@pt.cs.cmu.edu> <14628@encore.Encore.COM> Sender: peter@nucleus.amd.com Distribution: usa Organization: Advanced Micro Devices, Austin, TX Lines: 63 In article <14628@encore.Encore.COM> jcallen@encore.Com (Jerry Callen) writes: | cas2 is a horrible case of overkill, but a "reasonable" compare and swap | instruction would be AWFULLY handy. By "reasonable" I mean an instruction | that: | | - fetches an aligned word from memory | - compares it to a register | - if the values match, the value in another register is stored back | - a condition code is set somewhere that indicates whether or not the | store was done. | | The ancient IBM 370 offered essentially this instruction; the OS used it | extensively, and I used it to implement locks at user level. It was | sure handy! | | Yes, this is VERY complex by RISC standards, but it makes a number of | nasty synchronization problems almost trivially easy. You can sorta-kinda | emulate compare and swap using test and set (which the 88K has) and spin | locks, but to do so in complete safety you have to disable interrupts | (nasty from user code...). | | What makes a "reasonable" compare and swap so hard to implement that | current RISCs don't do it? It seems to me that implementing test and | set already forces MOST of the hard work to be done anyway. | | -- Jerry Callen | jcallen@encore.com An efficient compare-and-swap (CAS) function can be provided in RISC architectures without providing one instruction that does it. I'll give an example using the load linked (LL) and store conditionally (SC) instructions in the MIPS II architecture. The LL reads from a memory location and sets a status bit. The SC completes the write only if the status bit is still set - the status bit gets cleared when a SC writes to the location associated with the status bit. The following code provides the compare-and-swap(old,new,match) == compare-and-swap(T0,T1,T2) function: L: LL T4,(T0) ;T0 has the address of old BNE T4,T2,SKIP ;if old != match, skip NOP SC T1,(T0) ;write new atomically!! ;SC is success only if T1 is 1 (I think) ... SKIP: ;store in compare-and-swap didn't happen If several processes simultaneously execute the compare-and-swap, only one process that first executes the SC instruction will succeed. On a side note, I think the only real benefit of the compare-and-swap over the more simpler test-and-set is that it semantically defines when the write is to take place. This definition reduces the number of writes in spinlock that require coherency actions in cache-based systems. This is not to say that the test-and-set can't be implemented in the ways that eliminate the unnecessary writes. For instance, the Am29K has the LOADSET, a test-and-set type of instruction, that defines -1 to be the "set" value. Since the "set" value is defined in hardware, the LOADSET can check the lock and not start the write if the lock is already set. This is possible only if the instruction defines the "set" value. -Peter -- S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices peter@nucleus.amd.com (800) 531-5202 x54818