Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!cwjcc!gatech!hubcap!reed
From: reed@m.cs.uiuc.edu
Newsgroups: comp.parallel
Subject: Re: fine-grained and/or multiway optimi
Message-ID: <3605@hubcap.UUCP>
Date: 21 Nov 88 00:12:57 GMT
Sender: fpst@hubcap.UUCP
Lines: 125
Approved: parallel@hubcap.clemson.edu

I'd like to clarify a misconception:

>khb@sum.com writes:

>>In article <3465@hubcap.UUCP> wilson@uicbert.eecs.uic.edu writes:
>>
>>   3.  How promising are optimistic techniques, really?  They would
>>       *seem*, to a nonexpert like me, to be the only way around
>>       Amdahl's law without redesigning all of our algorithms to
>>       have tremendous parallelism, which is difficult.

> These don't always make the rewrite easier. There was a paper
> presented at a SIGMETRICS or two ago which purported to prove that
> parallelism didn't work. The fellow worked for IBM, and strangely
> enough the ruling committee really like the paper (gave it an honor).
> Brian's results are extremely encouraging (as of a couple of years
> ago). They built discrete simulations which worked. Got the speedup,
> not infinite time loops. What more can we ask of a research project ?


The paper to which you refer is more accessible as:

	D. A. Reed, A. D. Malony, and B. D. McCredie, "Parallel
	Discrete Event Simulation Using Shared Memory," IEEE
	Transactions on Software Engineering, Vol. 14, No. 4,
	April 1988, pp. 541-553.

The paper considers the performance of ONE implementation of
*conservative* distributed discrete event simulation.  Unlike
optimistic strategies (like Time Warp), conservative strategies advance
the local clocks only when conditions insure that they will never be
rolled back (i.e., all clock values must be monotonically increasing).

In particular, the paper considers the general case when the underlying
parallel simulation support has no knowledge of the structure of the
simulation model.  By comparison, this is similar to most sequential
simulation systems -- the underlying process and event list management
are oblivious to the structure or semantics of the simulation model.
Why take this approach?  Well, if it works, it provides the underlying
support for "automatic" parallelism.  By that, I mean that a
traditional process-oriented simulation could be supported by a
conservative distributed simulation mechanism, realizing the
parallelism in the simulation model without requiring the simulation
programmer to consider two levels of parallelism -- the one in the
model and the model structure necessary to achieve the simulated
parallelism.

The argument for this approach is purely one of software engineering.
Discrete event simulations most often model systems with some
parallelism and these simulations are often written using
process-oriented simulation languages (i.e., parallel programming
languages).  If the simulation programmer must also provide hints to
the underlying simulation support about how to parallelize the
process-oriented program, the programming burden grows.  Ideally, the
simulation support should automatically realize the parallelism in the
simulation model.  Finally, simulation programs are often changed as
one explores structural variations of the simulated system.  One would
like to separate the realization of parallelism from the goal of
simulation -- exploring system variations.

The paper considers simulation of queueing network models on a Sequent
Balance (bus-based shared memory system) running Unix.  The implemented
conservative simulation strategy uses no knowledge about the simulation
semantics (e.g., queueing strategies or service time distributions).
The servers of the queueing network simply exchange messages and
execute in parallel.  In the conservative strategy used, networks with
cycles can deadlock when trying to insure monotonic clocks.  Two
mechanisms were used:  deadlock avoidance and deadlock detection and
recovery.  The performance measurements showed that speedup was
strongly dependent on the structure of the simulated network.  This is
not surprising; it is analogous to saying that speedup in parallel
programs depends on the structure of the program.   In our
implementation networks with cycles often suffered from sizable
overheads due to deadlock processing.  Much of this is unnecessary; it
is protecting against possible causality violations that never occur.
As an aside, this is one argument for optimistic techniques.
Unfortunately, most interesting models contain cyclic interactions
among the model components.  To achieve good speedup in these cases
using conservative strategies, it seems likely that some knowledge
about the simulated system must be used.  This is the motivation
for recent theoretical work on conditional knowledge (by Chandy
and others).

To summarize, the Reed/Malony/McCredie paper presents experimental
evidence that the performance of conservative simulation
implementations (of queueing network models) that do not rely on knowledge
about the simulated system often yield limited speedups; the performance of
individual cases depends on their structure.

More recent experimental studies by Fujimoto (Utah) and Lazowska
(Washington) have shown that taking advantage of knowledge about the
model (.e.g., knowing that queues are FIFO) can significantly reduce
the overhead for conservative simulation and yield good speedups (with
accompanying loss of generality).  This approach is highly promising
and I am greatly impressed by the quality of their work. 
As an example, Lazowska and his students have implemented an object-oriented
support package that provides much of the needed information
automatically for the classes in the library (as one example, FIFO
queues).

The balance between simulation programming to express the semantics of
the simulated system and to specify the information needed to
parallelize the simulation is important.  Just as in parallel
programming, the goal is results, not parallelism.  If achieving
parallelism is difficult, time consuming, and overly sensitive to the
model structure, the already difficult problems of simulation
programming will become even more challenging.  For conservative
strategies, I believe, and this is my personal opinion unsupported by
any hard evidence, the best solution is support tools that automatically
obtain the bulk of the information needed to parallelize.  This information
may be obtained by some combination of language structure (e.g.,
declarations), object-oriented libraries (e.g., FIFO queues), or compilers.

There are lots of interesting, open questions in distributed
simulation, both conservative and optimistic.  The next few years will
tell the tale.


Dan Reed
Department of Computer Science
University of Illinois
Urbana, Illinois  61801

reed@a.cs.uiuc.edu