Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!usc!cs.utexas.edu!uunet!dg!rec
From: rec@dg.dg.com (Robert Cousins)
Newsgroups: comp.unix.wizards
Subject: Re: Information on SPARC assembly (atomic Test and Set)
Message-ID: <194@dg.dg.com>
Date: 20 Jun 89 12:44:09 GMT
References: <350@osc.COM> <577@lakart.UUCP> <5742@lynx.UUCP>
Reply-To: rec@dg.UUCP (Robert Cousins)
Organization: Data General, Westboro, MA.
Lines: 64

In article <5742@lynx.UUCP> m5@lynx.UUCP (Mike McNally) writes:
>In article <577@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>>I have never understood the need for a test and set instruction, when
>>you can make do with adc (add with carry). Allow me to explain:
>>
>>The point behind TAS is to allow a process to test if a flag is set or
>>clear, and set it no matter what the result. But why does the test have
>>to be in the same instruction? 
>
>The example given by Mr. Goodenough in fact incorporates the changing of 
>the state of the flag in one instruction (the add-with-carry).  It is
>thus true that the sequence is unbreakable *at the OS level*: a normal
>OS will not reschedule while a task is in the middle of an instruction,
>because most CPU's won't allow interrupts in the middle of an instruction.
>(Note that this is not necessarily the case.)  A real TAS instruction
>often comes with the proviso that the bus cycles used to fetch and store
>are not interruptable either.  This guarantee is necessary in a multi-
>processor environment.
>
>I think that the x86 (x>0) series locks the bus on all XCHG instructions.
>The original chips required a LOCK prefix.  I don't know whether or not
>the LOCK is honored with other read/write instructions.

Actually, the LOCK prefix was somewhat more powerful than orignally
intended in initial 8086 family products.  One could use the LOCK prefix
before the REP prefix to build a locked string operation!  Since these could
be up to 64K iterations long and since the 8086 isn't that fast, it was
theoretically possible to lock other processors from the bus for extended
periods of time.

There is another reason why atomic operations are useful:  whenever
there is some modicum of peripheral intelligence (as is commonly found
with modern LAN controller chips), there arise cases in which memory
discriptors need to be updated in a controlled fashion.  For example,
after building a packet in memory, the packet must be linked into the
controller's out going packet list.  Since the controller may be actively
transmitting at that instant or worse yet, may be traversing links in 
list to find the next packet, an atomic operation makes possible a "seamless"
insertion into the list.  However, relatively few systems are designed to
take advantage of this feature.

The interlocked exchange operation is perhaps the most common tool for
multiprocessor operation.  Using it, one can simulate the test-and-set
operation, the test-and-clear operation and through careful use of global
values, sequenced locks and integer semaphores become practical.  Some 
CPU families go out of their way to add interlocked operations.  The DG
MV series and the NSC 32000 have a list of instructions which operate in
this fashion.

BTW, the TAS instruction makes barrier synchronization much simpler.  Without
it, writing a ROM to handle 'n' processors coming out of reset at the same
time and trampling over each other would not be as easy.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for Myself alone.

>-- 
>Mike McNally                                    Lynx Real-Time Systems
>uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233
>
>            Where equal mind and contest equal, go.