Path: utzoo!mnetor!uunet!lll-winken!lll-crg.llnl.gov!brooks From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III) Newsgroups: comp.arch Subject: Re: Task synchronization in multiprocessors? Message-ID: <4530@lll-winken.llnl.gov> Date: 3 Mar 88 22:24:49 GMT References: <317@amelia.nas.nasa.gov> Sender: usenet@lll-winken.llnl.gov Reply-To: brooks@lll-crg.llnl.gov.UUCP (Eugene D. Brooks III) Organization: Lawrence Livermore National Laboratory Lines: 33 In article <317@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: >I've just finished Stones recent book (High Performance Computer >Architecture) which has left me wanting to know more about >architectural support for task synchronization in multiprocessors. > >I'm particularly interested in production MIMD shared memory systems >and alternatives to semaphore critical region or barrier >synchronization. A large part of the problem with using a semaphore >for synchronization is that it introduces a hot spot in the memory >system and forces some potentially parallel activities to be serial. For a bottleneck free barrier synchronization algorithm, which is heavily used on several existing multiprocessors, see "The Butterfly Barrier", International Journal of Parallel Programming, Vol. 15, No. 4, Aug 86, pp. 295-307. This software algorithm, which is bottleneck free and runs in logarithmic time, is critically compared to direct hardware barrier support (in the context of Gauss elimination algorithms) using the Cerberus multiprocessor simulator in the proceedings of the 1987 SIAM conference on parallel processing for scientific computing, which should be out any month now..... See an article titled "Gaussian techniques on shared memory multiprocessors", the Cerberus multiprocessor simulator itself is also described in this proceedings. As for alternatives to simple "locks", (implemented with swap register with memory) and barriers (implemented either in hardware or with a good software algorithm) you can go a surprising long way with these and the "live lock" problems incurred with the combining operations should not be ignored. What? You want to use a thousand processors? We would be happy to get a few tens jumping on a single application efficiently! Eugene