Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!gatech!hubcap!matloff%crow.Berkeley.EDU
From: matloff%crow.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff)
Newsgroups: comp.parallel
Subject: Re: Distributed simulation
Message-ID: <5382@hubcap.clemson.edu>
Date: 2 May 89 19:18:11 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 84
Approved: parallel@hubcap.clemson.edu

In article <5367@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes:
>In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes:

>Although your argument seems quite reasonable, it is, alas, untrue.
>First of all, note that the steady-state of a stochastic process is
>represented not by a single value, but by a *distribution* of that value.

Actually, Law's statement that you quoted is self-contradictory, it seems
to me.  He is apparently willing to use the mean in one-dimensional
problems but not multidimensional problems, which doesn't make sense.

>Therefore, what you really need to do is estimate the distribution of the
>steady-state, and then start up a number of runs, with the intitial
>conditions chosen at random from that distribution.

True, this would be optimal.  But does it make that much difference?
Unfortunately, I don't have time to go read the paper right now, but
I would guess that it really doesn't make that much difference, and
that using the mean is nearly optimal in terms of *result*.  In other
words, even though 9 and 15 are far apart in the excerpt quoted below,
I would guess that E(D9) and E(D15) are NOT far apart.

>     Let i(l,0.01) [be] the smallest point [value of i]
>     beyond which all values of E(Di) fall withing 1 percent of d,
>     viewed as a function of the initial number in the system, l.
>     Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01)
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
>     is minimized for l = 15.  This result is interesting since
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
>     the steady-state time-average number of customers in system is 9.
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]

>> Or, better yet, in systems which have regeneration points and don't
>> have huge (in the notation from my posting yesterday, >> T/P)
>> regeneration cycle times, one can simply start each replicate at
>> the regeneration point  --  thus no initialization bias at all.

>True, the method of regeneration theoretically does not suffer from the
>initialization bias problem.  But "theoretically" is the key word here,
>because unless what you are simulating is truly a regenerative process,
>your regeneration points are only approximate.  

Again, I think approximate is good enough in many cases.  Also, I would
mention that I've seen a number of specific applications in which it was
claimed that there was no regeneration point, but that if things like
symmetries are taken into account, it turns out that there are good
regeneration points which one can use.

>> Of course, you do run into the problem you mentioned, i.e. the
>> problem of "waiting for the slowest one," which gave you your
>> P/log P figure.  However, analogous kinds of problem should plague
>> the task-partitioning approach too, e.g. as P increases, the
>> problem of idle time should get worse and worse.

>But, since the number of replications being run when intra-replication
>parallelism is used in less than P, the magnitude of the effect is smaller.

No, what I am saying is that even with no replication at all, there will
be **analogous** straggler effects, due to interprocess dependencies.

>> The task-partitioning approach also has a scaling problem.  For
>> a fixed application, say a 10-server queue, there will be a very
>> small limit on the usable value of P.  After a certain point,
>> you just can't find anything for the processors to do.

>Definitely true, which is why the best approach for a large number of
>processors is probably a combination of inter- and intra- replication
>parallelism.

That could well be true.

>> In most simulations of queueing systems in practice, memory should not
>> be a problem.  The length of the event list should be on the order of 
>> magnitude of the number of servers, which is generally not large.

>Depends on what is being simulated.  I have friends at Bellcore who were
>simulating a metropolitan trunk network in which the steady-state size of
>the event list was about 15 THOUSAND events.

Yes.  John Comfort once mentioned in a conference that he had seen one
case in which there were typically 100,000 events in the list.  But that's
why I said "most."

   Norm