Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!stanford.edu!unix!Teknowledge.COM!uw-beaver!tera.com!bob From: bob@tera.com (Bob Alverson) Newsgroups: comp.arch Subject: Re: Atomic operations (Was Re: Living With Old Baggage) Keywords: 68040 CAS2 Message-ID: <1991May14.154653.26310@tera.com> Date: 14 May 91 15:46:53 GMT References: <1991May6.113149.14531@decvax.dec.com> <1991May13.140634.17669@dg-rtp.dg.com> <1991May14.033023.26083@iecc.cambridge.ma.us> Sender: news@tera.com (News Administrator) Organization: /etc/organization Lines: 35 In article <1991May14.033023.26083@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: >In article <1991May13.140634.17669@dg-rtp.dg.com> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: >>Atomic operations, such as compare-and-swap, that update memory with a value >>which is a function of the old value are generally useful. They allow you >>to do things that cannot be done with operations such as xmem or test-and- >>set, which update memory with a value that cannot be a function of the >>previous value. >> >>Compare-and-swap cannot be fabricated out of xmem or test-and-set without >>disabling interrupts or severely constraining its use. This is >>unsatisfactory for user code and many places inside the operating system. > > ... > >Most other locking primitives are less powerful than CAS. The only ones that >are as powerful are at least as complex, e.g. atomic queue operations with >peeking or atomic move from one shared memory location to another. > Unfortunately, CAS is much more costly in hardware than a plain swap or a fetch and op. A load uses an address and returns a word. A store uses an address and a word. Swap is the union of load and store, taking an address and a data word and returning a data word. Fetch and op is a little more, making the data stored a function of the data read. CAS is still more, needing an extra data word which may blaze new word wide datapaths through the memory access logic. I also take issue with the idea of n threads reaching consensus by acting on a single memory location. Using CAS the software is wait free, but the hardware must serialize all the accesses. I suppose you could do a combining network (NYU style), but can't the combining be implemented in software somehow? Or is that fundamentally impossible to do without waiting? Bob