Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!cbatt!ucbvax!sdcsvax!darrell
From: darrell@sdcsvax.UUCP
Newsgroups: mod.os
Subject: Re: Performance analysis of computer systems
Message-ID: <2618@sdcsvax.UCSD.EDU>
Date: Tue, 27-Jan-87 22:50:21 EST
Article-I.D.: sdcsvax.2618
Posted: Tue Jan 27 22:50:21 1987
Date-Received: Thu, 29-Jan-87 03:33:16 EST
Sender: darrell@sdcsvax.UCSD.EDU
Organization: NASA Ames Research Center, Mountain View, CA
Lines: 46
Approved: mod-os@sdcsvax.uucp

--

In article <2614@sdcsvax.UCSD.EDU> fouts@orville%ames.arpa (Marty Fouts) writes:
>--
>
>The method I use for performance analysis depends heavily on the problem
>being investigated.  Most of my work is in measurement and tuning of
>operating systems, so I usually start by instrumenting the system of
>interest and then performing statistical analysis on the results.
>
>[Could you explain how you "instrument" the system?  -DL]
>

Sure, instrumentation can be done in two ways.  When you are very luck,
you can use an external hardware monitor to sample the state of the system
(usually PS and some status registers) and then later run the samples through
software which correlates it to software states.  This is the 'easy' way.

When you are not lucky, you modify the operating system to increment counters
based on periodic state checks (user versus system state, for example) or on
the occurance of events.  (I/O completion.)  Sometimes you check periodic data
at event occurance, like recording the amount of idle time accumulated by the
process which is about to be made runable as a result of an i/o completion.

There are three major problems here, along with a number of gotchas I won't
go into.  First, is the autocorrelation problem.  If the samples are always
taken on a major clock tick, they may reflect state which is dependent on
the tick having just happened.  This can cause performance data to be
skewed in sometimes subtle ways.

Second, is the interaction problem.  Adding code to an operating system always
changes the timing of the system.  Sometimes it doesn't impact the feature
being measured, but you can never tell for certain.  Sometimes, especially
when measuring real time systems, instrumentation can have an adverse impact
on the system.  Adding .1 millisecond of CPU time to a routine called once
a millisecond can have a substantial impact on a system.

Third is the capture problem.  Determining how to retrieve information being
gathered in real time in a way which creates a consistent view of the system
can be a major problem.  You want to have all of the data consistent at some
point of time and then to be able to capture all of the data in an atomic
action, and that usually isn't possible.  Also, you have to figure out where
to put all of the data you are capturing it.  Sometimes you are generating
enough data to require some data reduction be performed in real time.

--