Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!stanford.edu!unix!Teknowledge.COM!uw-beaver!tera.com!bob
From: bob@tera.com (Bob Alverson)
Newsgroups: comp.arch
Subject: Re: Atomic operations (Was Re: Living With Old Baggage)
Keywords: 68040 CAS2
Message-ID: <1991May14.154653.26310@tera.com>
Date: 14 May 91 15:46:53 GMT
References: <1991May6.113149.14531@decvax.dec.com> <1991May13.140634.17669@dg-rtp.dg.com> <1991May14.033023.26083@iecc.cambridge.ma.us>
Sender: news@tera.com (News Administrator)
Organization: /etc/organization
Lines: 35

In article <1991May14.033023.26083@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
>In article <1991May13.140634.17669@dg-rtp.dg.com> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes:
>>Atomic operations, such as compare-and-swap, that update memory with a value
>>which is a function of the old value are generally useful.  They allow you
>>to do things that cannot be done with operations such as xmem or test-and-
>>set, which update memory with a value that cannot be a function of the
>>previous value.
>>
>>Compare-and-swap cannot be fabricated out of xmem or test-and-set without
>>disabling interrupts or severely constraining its use.  This is
>>unsatisfactory for user code and many places inside the operating system.
>
> ...
>
>Most other locking primitives are less powerful than CAS.  The only ones that
>are as powerful are at least as complex, e.g. atomic queue operations with
>peeking or atomic move from one shared memory location to another.
>

Unfortunately, CAS is much more costly in hardware than a plain swap
or a fetch and op.  A load uses an address and returns a word.  A
store uses an address and a word.  Swap is the union of load and
store, taking an address and a data word and returning a data word.
Fetch and op is a little more, making the data stored a function of
the data read.  CAS is still more, needing an extra data word which
may blaze new word wide datapaths through the memory access logic.

I also take issue with the idea of n threads reaching consensus by
acting on a single memory location.  Using CAS the software is wait
free, but the hardware must serialize all the accesses.  I suppose you
could do a combining network (NYU style), but can't the combining be
implemented in software somehow?  Or is that fundamentally impossible
to do without waiting?

Bob