Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!europa.asd.contel.com!noc.sura.net!mars!orion!stodola From: stodola@orion.fccc.edu (Robert K. Stodola) Newsgroups: comp.unix.wizards Subject: Re: accurate runtime accounting (was Load Avarage graph pattern) Message-ID: <1991Jun18.175921.14843@fccc.edu> Date: 18 Jun 91 17:59:21 GMT References: <14081@dog.ee.lbl.gov> <1991Jun12.130441.20640@fccc.edu> <14398@dog.ee.lbl.gov> Sender: news@fccc.edu (USENET News System) Organization: Fox Chase Cancer Center, Philadelphia PA Lines: 51 Nntp-Posting-Host: relay.fccc.edu In article <14398@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >In article <14081@dog.ee.lbl.gov> I noted that Unix CPU accounting is >generally fairly poor, and wrote: >>>The solution is simple but requires relatively precise clocks. ... > >In article <1991Jun12.130441.20640@fccc.edu> Stodola says: >>One of my associates and I did a study of this a number of years ago >>(actually it was with a PDP-11/70 running IAS). We found that there >>was substantial clock synchronized usage on the system. The solution >>we found didn't require very precise clocks at all. Simply one whose >>rate was relatively prime to the system clock. > >This works well in a number of situations, but I believe it will miss >short-lived processes on modern (fast) machines. Unix boxes generally >run their scheduling clock in the range 50..500 Hz. Some of these have >CPUs that run 40 million instructions per second; some things take only >a few thousand instructions, and it seems intuitively obvious% that >they might `slip through the cracks'. [%This is research-ese for `We >did not try it out but we wrote a paper on it anyway.' :-) ] > >In other words, I think `PDP-11/70' may be an important constraint >above. A relatively prime profiling clock is likely to work well on > [More deleted] I guess I should have explained the context of the project. The purpose was to obtain accurate usage information on a per user basis, and provide good average load statistics. If the context switcher itself doesn't keep this info using a very accurate clock (ie. a non-interrupting read-only clock with megaHz resolution), you can't ever accurately measure this [actually, we kicked it around at lunch and tossed out some silly ideas for having another machine on the bus counting instructions, but the conversation quickly deteriorated from there]. In this context, the speed of the clock is less important than its lack of synchronization with the system clock. Those thousand instructions (taking 1/40000th second on the machine you have postulated) has a one in 500 chance of being interrupted by a, say, 80Hz clock. So when you see it, you score it with an 80th of a second. Since you miss it 499 other times, you get it right on the average. That is, for every 10000 times the code runs, you see it 20 times, and score it with 1/80th of a second each time (20 * 1/80 = 10000 * 1000/40000000). Speeding up the clock merely improves the variance for a given number of samples, but doesn't effect your ability to see a short sequence in a statistical sense. Obviously, if you need to know EXACTLY how many cycles were used in a PARTICULAR clock tick, or EXACTLY how many cycles a PARTICULAR process used in a PARTICULAR tick, this method doesn't do it. The importance of the statistical method of measurement is that you avoid the rhythms imposed by the system clock entirely. [BTW - we both tried it and wrote the paper :-) ] -- stodola@fccc.edu -- Robert K. Stodola (occasionally) speaks for himself.