Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!cwjcc!gatech!hubcap!reed From: reed@m.cs.uiuc.edu Newsgroups: comp.parallel Subject: Re: fine-grained and/or multiway optimi Message-ID: <3605@hubcap.UUCP> Date: 21 Nov 88 00:12:57 GMT Sender: fpst@hubcap.UUCP Lines: 125 Approved: parallel@hubcap.clemson.edu I'd like to clarify a misconception: >khb@sum.com writes: >>In article <3465@hubcap.UUCP> wilson@uicbert.eecs.uic.edu writes: >> >> 3. How promising are optimistic techniques, really? They would >> *seem*, to a nonexpert like me, to be the only way around >> Amdahl's law without redesigning all of our algorithms to >> have tremendous parallelism, which is difficult. > These don't always make the rewrite easier. There was a paper > presented at a SIGMETRICS or two ago which purported to prove that > parallelism didn't work. The fellow worked for IBM, and strangely > enough the ruling committee really like the paper (gave it an honor). > Brian's results are extremely encouraging (as of a couple of years > ago). They built discrete simulations which worked. Got the speedup, > not infinite time loops. What more can we ask of a research project ? The paper to which you refer is more accessible as: D. A. Reed, A. D. Malony, and B. D. McCredie, "Parallel Discrete Event Simulation Using Shared Memory," IEEE Transactions on Software Engineering, Vol. 14, No. 4, April 1988, pp. 541-553. The paper considers the performance of ONE implementation of *conservative* distributed discrete event simulation. Unlike optimistic strategies (like Time Warp), conservative strategies advance the local clocks only when conditions insure that they will never be rolled back (i.e., all clock values must be monotonically increasing). In particular, the paper considers the general case when the underlying parallel simulation support has no knowledge of the structure of the simulation model. By comparison, this is similar to most sequential simulation systems -- the underlying process and event list management are oblivious to the structure or semantics of the simulation model. Why take this approach? Well, if it works, it provides the underlying support for "automatic" parallelism. By that, I mean that a traditional process-oriented simulation could be supported by a conservative distributed simulation mechanism, realizing the parallelism in the simulation model without requiring the simulation programmer to consider two levels of parallelism -- the one in the model and the model structure necessary to achieve the simulated parallelism. The argument for this approach is purely one of software engineering. Discrete event simulations most often model systems with some parallelism and these simulations are often written using process-oriented simulation languages (i.e., parallel programming languages). If the simulation programmer must also provide hints to the underlying simulation support about how to parallelize the process-oriented program, the programming burden grows. Ideally, the simulation support should automatically realize the parallelism in the simulation model. Finally, simulation programs are often changed as one explores structural variations of the simulated system. One would like to separate the realization of parallelism from the goal of simulation -- exploring system variations. The paper considers simulation of queueing network models on a Sequent Balance (bus-based shared memory system) running Unix. The implemented conservative simulation strategy uses no knowledge about the simulation semantics (e.g., queueing strategies or service time distributions). The servers of the queueing network simply exchange messages and execute in parallel. In the conservative strategy used, networks with cycles can deadlock when trying to insure monotonic clocks. Two mechanisms were used: deadlock avoidance and deadlock detection and recovery. The performance measurements showed that speedup was strongly dependent on the structure of the simulated network. This is not surprising; it is analogous to saying that speedup in parallel programs depends on the structure of the program. In our implementation networks with cycles often suffered from sizable overheads due to deadlock processing. Much of this is unnecessary; it is protecting against possible causality violations that never occur. As an aside, this is one argument for optimistic techniques. Unfortunately, most interesting models contain cyclic interactions among the model components. To achieve good speedup in these cases using conservative strategies, it seems likely that some knowledge about the simulated system must be used. This is the motivation for recent theoretical work on conditional knowledge (by Chandy and others). To summarize, the Reed/Malony/McCredie paper presents experimental evidence that the performance of conservative simulation implementations (of queueing network models) that do not rely on knowledge about the simulated system often yield limited speedups; the performance of individual cases depends on their structure. More recent experimental studies by Fujimoto (Utah) and Lazowska (Washington) have shown that taking advantage of knowledge about the model (.e.g., knowing that queues are FIFO) can significantly reduce the overhead for conservative simulation and yield good speedups (with accompanying loss of generality). This approach is highly promising and I am greatly impressed by the quality of their work. As an example, Lazowska and his students have implemented an object-oriented support package that provides much of the needed information automatically for the classes in the library (as one example, FIFO queues). The balance between simulation programming to express the semantics of the simulated system and to specify the information needed to parallelize the simulation is important. Just as in parallel programming, the goal is results, not parallelism. If achieving parallelism is difficult, time consuming, and overly sensitive to the model structure, the already difficult problems of simulation programming will become even more challenging. For conservative strategies, I believe, and this is my personal opinion unsupported by any hard evidence, the best solution is support tools that automatically obtain the bulk of the information needed to parallelize. This information may be obtained by some combination of language structure (e.g., declarations), object-oriented libraries (e.g., FIFO queues), or compilers. There are lots of interesting, open questions in distributed simulation, both conservative and optimistic. The next few years will tell the tale. Dan Reed Department of Computer Science University of Illinois Urbana, Illinois 61801 reed@a.cs.uiuc.edu