Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!cbatt!ucbvax!sdcsvax!darrell From: darrell@sdcsvax.UUCP Newsgroups: mod.os Subject: Re: Performance analysis of computer systems Message-ID: <2618@sdcsvax.UCSD.EDU> Date: Tue, 27-Jan-87 22:50:21 EST Article-I.D.: sdcsvax.2618 Posted: Tue Jan 27 22:50:21 1987 Date-Received: Thu, 29-Jan-87 03:33:16 EST Sender: darrell@sdcsvax.UCSD.EDU Organization: NASA Ames Research Center, Mountain View, CA Lines: 46 Approved: mod-os@sdcsvax.uucp -- In article <2614@sdcsvax.UCSD.EDU> fouts@orville%ames.arpa (Marty Fouts) writes: >-- > >The method I use for performance analysis depends heavily on the problem >being investigated. Most of my work is in measurement and tuning of >operating systems, so I usually start by instrumenting the system of >interest and then performing statistical analysis on the results. > >[Could you explain how you "instrument" the system? -DL] > Sure, instrumentation can be done in two ways. When you are very luck, you can use an external hardware monitor to sample the state of the system (usually PS and some status registers) and then later run the samples through software which correlates it to software states. This is the 'easy' way. When you are not lucky, you modify the operating system to increment counters based on periodic state checks (user versus system state, for example) or on the occurance of events. (I/O completion.) Sometimes you check periodic data at event occurance, like recording the amount of idle time accumulated by the process which is about to be made runable as a result of an i/o completion. There are three major problems here, along with a number of gotchas I won't go into. First, is the autocorrelation problem. If the samples are always taken on a major clock tick, they may reflect state which is dependent on the tick having just happened. This can cause performance data to be skewed in sometimes subtle ways. Second, is the interaction problem. Adding code to an operating system always changes the timing of the system. Sometimes it doesn't impact the feature being measured, but you can never tell for certain. Sometimes, especially when measuring real time systems, instrumentation can have an adverse impact on the system. Adding .1 millisecond of CPU time to a routine called once a millisecond can have a substantial impact on a system. Third is the capture problem. Determining how to retrieve information being gathered in real time in a way which creates a consistent view of the system can be a major problem. You want to have all of the data consistent at some point of time and then to be able to capture all of the data in an atomic action, and that usually isn't possible. Also, you have to figure out where to put all of the data you are capturing it. Sometimes you are generating enough data to require some data reduction be performed in real time. --