Path: utzoo!mnetor!uunet!lll-winken!lll-crg.llnl.gov!brooks
From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III)
Newsgroups: comp.arch
Subject: Re: Task synchronization in multiprocessors?
Message-ID: <4530@lll-winken.llnl.gov>
Date: 3 Mar 88 22:24:49 GMT
References: <317@amelia.nas.nasa.gov>
Sender: usenet@lll-winken.llnl.gov
Reply-To: brooks@lll-crg.llnl.gov.UUCP (Eugene D. Brooks III)
Organization: Lawrence Livermore National Laboratory
Lines: 33

In article <317@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes:
>I've just finished Stones recent book (High Performance Computer
>Architecture) which has left me wanting to know more about
>architectural support for task synchronization in multiprocessors.
>
>I'm particularly interested in production MIMD shared memory systems
>and alternatives to semaphore critical region or barrier
>synchronization.  A large part of the problem with using a semaphore
>for synchronization is that it introduces a hot spot in the memory
>system and forces some potentially parallel activities to be serial.
For a bottleneck free barrier synchronization algorithm, which is heavily
used on several existing multiprocessors, see "The Butterfly Barrier",
International Journal of Parallel Programming, Vol. 15, No. 4, Aug 86,
pp. 295-307.  This software algorithm, which is bottleneck free and runs
in logarithmic time, is critically compared to direct hardware barrier
support (in the context of Gauss elimination algorithms) using the Cerberus
multiprocessor simulator in the proceedings of the 1987 SIAM conference on
parallel processing for scientific computing, which should be out any
month now.....  See an article titled "Gaussian techniques on shared
memory multiprocessors", the Cerberus multiprocessor simulator itself is
also described in this proceedings.

As for alternatives to simple "locks", (implemented with swap register
with memory) and barriers (implemented either in hardware or with a
good software algorithm) you can go a surprising long way with these
and the "live lock" problems incurred with the combining operations
should not be ignored.


What?  You want to use a thousand processors?  We would be happy to
get a few tens jumping on a single application efficiently!


						Eugene