Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!samsung!ernie.viewlogic.com!m2c!umvlsi!dime!dime.cs.umass.edu!moss From: moss@cs.umass.edu (Eliot Moss) Newsgroups: comp.arch Subject: Re: Computer time measurements (Was Re: 64 bits for times....) Message-ID: Date: 22 Aug 90 12:42:33 GMT References: <26012@bellcore.bellcore.com> <11187@alice.UUCP> <1990Aug22.044826.18572@portia.Stanford.EDU> Sender: news@dime.cs.umass.edu Reply-To: moss@cs.umass.edu Organization: Dept of Comp and Info Sci, Univ of Mass (Amherst) Lines: 36 In-reply-to: kevinw@portia.Stanford.EDU's message of 22 Aug 90 04:48:26 GMT I do software performance measurement and would *like* resolution down to the clock rate of the machine. Personally, I generally want to include the time taken by pipeline stalls, cache misses, etc., since that is relevant to the user. I guess what I would really like is elapsed (wall clock) time, cpu time for the process (split into user and system time), and possibly counters of other things (instructions executed, memory cycles (maybe split into reads/writes), cache hits/misses, page translation hits/misses, etc.). I don't think any of this is necessary *hard* to do, but it does take chip real estate. The counters should be readable with ordinary instructions, but maybe settable only with special ones (though if kept on a per process basis, a process can only screw up itself). The most important items for general use are elapsed and cpu time, with resolution down to the machine clock cycle time. Except on machines that stretch clocks (as opposed to inserting "wait states"), this is not technologically difficult, though the number of bits required may necessitate an atomic operation to read the counter being sampled into a special read out register, than can then be examined at leisure (and similarly for setting). At current speeds, I can probably live with 1 microsecond or 100 ns resolution, but it won't be long before we'll need 1 ns resolution or finer. I should add that all of this is useful to me for measuring the speed of execution of short blocks of code. I use the numbers to decide on different ways of implementing things for advanced programming languages. Repeating operations over and over tends to lead to distorted measurements, since repeated loops tend to become cache resident more than they might in an actual program, etc. -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206; Moss@cs.umass.edu