Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!gatech!hubcap!wagner
From: wagner@june.cs.washington.edu (Bullwinkle J. Moose)
Newsgroups: comp.parallel
Subject: Re: Distributed simulation
Summary: When intuition fails...
Message-ID: <5367@hubcap.clemson.edu>
Date: 2 May 89 12:37:11 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 125
Approved: parallel@hubcap.clemson.edu

In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes:
> In article <5312@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) (Hey!  that's me!) writes:
> >Because of the presence of an initialization bias in any
> >stochastic simulation, it is important to run each replication long
> >enough to ensure that the confidence intervals that one obtains for
> >some estimator actually contain the true value of what is being
> >estimated.
> 
> Yes, as I recall, that was the theme of Heidelberger's paper.  But
> it seems to me that it's not so simple.
> 
> For example, in typical applications, a number of runs will be made
> of the same simulation program, with different parameters, e.g.
> different arrival rates.  After doing the first run, the user has
> a pretty good idea of the steady-state queue lengths, and can
> initialize the other runs with nonempty queues of those approximate
> sizes.  This should reduce the initialization bias quite substantially.

Although your argument seems quite reasonable, it is, alas, untrue.
First of all, note that the steady-state of a stochastic process is
represented not by a single value, but by a *distribution* of that value.
Therefore, what you really need to do is estimate the distribution of the
steady-state, and then start up a number of runs, with the intitial
conditions chosen at random from that distribution.  (Note that if you
knew the distribution of steady-state conditions exactly, then there would
be no point to doing the simulation at all!)

As an example of what happens if steady-state is chosen to be
the average value instead of a set of samples from the (estimated)
distribution, I refer you to "Statistical Analysis of Simulation Output
Data", by A.M. Law, Oper. Res. 31(6):983-1029 (June 1983).  In the
following excerpt from pages 1015-1017, my own comments are enclosed in [...]:

    "The suggested procedure [using pilot runs to estimate steady-state]
     would be easy to apply in the case of an M/M/1 queue, since the
     state of the system is a single integer-valued r.v., namely, the
     number of people in the system.  In the case of a complex real-world
     simulation, however, this procedure may require estimating the joint
     distribution of a large number of r.v.'s, some of which may be continuous.
     As a simple example of this difficulty, consider a single-server
     queueing system without exponential interarrival times or service times.
     In this case, we would have to estimate the joint distribution of number
     in system, time to the next arrival, and the remaining service time of
     the customer in service (if any).

     [ In the following paragraph, Di is defined to be the delay experienced
     by the ith customer to arrive at the server, and d is defined to be the
     steady-state expected average delay, i.e. lim(n-->oo) (D1+D2+...+Dn)/n]

    "Kelton and Law ["The Transient Behavior of the M/M/s Queue with
     Implications for Steady-State Simulation", TR 82-7, Dept. of MIS,
     U. of Arizona, 1982] computed E(Di) as a function of i and the number
     in system at time 0, say, l [lower-case l, not 1] for various M/M/s
     queues....
     Let i(l,0.01) [be] the smallest point [value of i]
     beyond which all values of E(Di) fall withing 1 percent of d,
     viewed as a function of the initial number in the system, l.
     Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01)
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
     is minimized for l = 15.  This result is interesting since
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
     the steady-state time-average number of customers in system is 9.
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]

In general, Law's article is an excellent guide to the pitfalls (his word)
of using classical statistical reasoning in the analysis of simulation
output data.  Not to mention intuition!

> Or, better yet, in systems which have regeneration points and don't
> have huge (in the notation from my posting yesterday, >> T/P)
> regeneration cycle times, one can simply start each replicate at
> the regeneration point  --  thus no initialization bias at all.

True, the method of regeneration theoretically does not suffer from the
initialization bias problem.  But "theoretically" is the key word here,
because unless what you are simulating is truly a regenerative process,
your regeneration points are only approximate.  And using regeneration
would definitely introduce the problem of stragglers (see my last
posting), because it might lead to a wide variation in completion time
of the replications unless the number of regeneration cycles within
each replication is very large.

> Of course, you do run into the problem you mentioned, i.e. the
> problem of "waiting for the slowest one," which gave you your
> P/log P figure.  However, analogous kinds of problem should plague
> the task-partitioning approach too, e.g. as P increases, the
> problem of idle time should get worse and worse.

But, since the number of replications being run when intra-replication
parallelism is used in less than P, the magnitude of the effect is smaller.
I didn't claim that the "tail effect" problem was a major one, only that
it "nudges the scale" a bit in the direction of intra-replication parallelism.

> The task-partitioning approach also has a scaling problem.  For
> a fixed application, say a 10-server queue, there will be a very
> small limit on the usable value of P.  After a certain point,
> you just can't find anything for the processors to do.

Definitely true, which is why the best approach for a large number of
processors is probably a combination of inter- and intra- replication
parallelism.

> 
> So I don't think that either approach is a clear win.  But the
> replicates approach certainly is much simpler.
> 
> >Finally, a pragmatic consideration against running multiple independent
> >replications in parallel is that some simulations have enormous memory
> >requirements,
> 

> In most simulations of queueing systems in practice, memory should not
> be a problem.  The length of the event list should be on the order of 
> magnitude of the number of servers, which is generally not large.

Depends on what is being simulated.  I have friends at Bellcore who were
simulating a metropolitan trunk network in which the steady-state size of
the event list was about 15 THOUSAND events.

"All generalizations are false, including this one."

		Dave Wagner
		University of Washington Comp Sci Department
		wagner@cs.washington.edu
		{cornell,harvard,ssc-vax,tektronix}!uw-beaver!wagner