Xref: utzoo comp.lang.c:31661 comp.unix.questions:25290
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!think.com!paperboy!snorkelwacker!bloom-beacon!eru!hagbard!sunic!mcsun!unido!mikros!mwtech!martin
From: martin@mwtech.UUCP (Martin Weitzel)
Newsgroups: comp.lang.c,comp.unix.questions
Subject: Re: ---- Running time & code efficiency ----
Message-ID: <905@mwtech.UUCP>
Date: 6 Sep 90 21:02:01 GMT
References: <742@babcock.cerc.wvu.wvnet.edu>
Reply-To: martin@mwtech.UUCP (Martin Weitzel)
Organization: MIKROS Systemware, Darmstadt/W-Germany
Lines: 130

In article <742@babcock.cerc.wvu.wvnet.edu> siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes:
>Hi. I am trying to analyze my C code and make it run faster.
>The program is to translate files in one format to another.
>I wonder if there is a generic way to decide a lower bound
>of the running time for this kind of problem?

Your description of the problem is too vague to decide about a
lower bound - my first guess would be that the time is proportional
to the length of the file read and the file written and of course
somewhat higher than reading and writing two such files without any
conversion.

If you work under UNIX, you may use cat, cp or dd to copy your
input file to /dev/null, resulting in a time I call "r_time".
Next take a file of the length of your output-file an do the
same, resulting in a time I call "r0_time". Finally just copy the
second file (not to /dev/null!), resulting in a time I call
"w_time". Now, the lowest bound - without any conversion -
for your problem is around "r_time + w_time - r0_time". (Here
I assumed that standard utilities like the one I mentioned
are well adapted to the system - eg. they use Standard-I/O
with a blocksize that matches the appropriate values for the
given file system - an assumption which may sometimes be wrong!)

>I know function calls consume time so I look through
>the code and cut off some unnecessary ones;
>I know you can trade space for speed, however I wonder how far
>you can go along this direction. But I cannot get much from these
>considerations. Any comments? Suggections?

Again, this depends so much on your problem, that few can be said.
What kind of conversion is it, you want to make? With simple
byte-to-byte conversions you may speed things up using tables
for look-up instead of compares. If the required conversions
depend on the context, you may switch between several tables.
For certain more complex conversions where tables are less
helpfull, you may eventually trade-in speed for space with
caching or pre-evaluating common patterns - but how much this
would help depends further on the typical input data.

>What is your favorate tool for timing analysis? I used "prof" but
>it did not give me the timing imformation for all functions in
>my program -- just a few functions were listed. I also used the
>system provided function called "times", it is pretty good.

Under UNIX prof is a good starting point. Note that prof gathers
the timing by statistical methods: At run-time in fixed intervals
the value of the program counter is sampled (only for profiled
programs, of course). This value is scaled and mapped into cells
of an array of counters, which is located in a special run-time
start-up module that is linked to your program if "-p" is specified
at this step. The special run-time start-up writes this array later
to the "mon.out" file. With the command prof the "mon.out" is mapped
mapped back to locations in the actual functions of the programs, using
the information of the name tables of "a.out" that are normally left
there by the linker for the purpose of debugging.

It's a little bit like if you stand up every sixty minutes from your
desk, walk to the window and make some notes of what you see outside.
After doing this for several weeks, your notes may be a good
approximation of what happens in the street outside your window.
But you may well have missed important events. Especially, if you
do this *exactly* every full hour, then noting the positions
of the hands of a clock out on the street will give a very wrong
impression of the outside world (if you base a description of the
outside world exclusively on your notes, not on common experience
with clocks :-)).

You should allways look to the results of prof with this in mind.
Chances are that very small functions are missed or timings are
mapped not to the true culprit, but in general you will get a good
approximation (Especially those "stroboscobic" effects I described at
the end of the last paragraph will occur very very very seldom.)

If you use prof you should also have a look at the number of calls
for every function. If you compile with the "-p" switch from the
source to the object, the compiler inserts a little extra code at
the start of each function to count the number of calls; you will
normally not have this extra code in the standard library, so that
you will see #calls-counts only for functions you've written yourself.
But on most systems there is a special compiled standard library
available. The usual name for this library is /lib/libc_p.a so you
can simply add "-lc_p" at the end of the command line in which you
call the link phase to produce the executable program, and all the
standard functions are taken from this special library instead from
the regular one.

As prof prints the number of calls for all functions, if a function seems
to consume much time but has a low number of calls (and vice versa)
this must not be wrong but should look suspicious to you and deserve
further investigation. Same if the order of the function in the source
file or the order in which the object files are given to the linker
change the timings substantially.

>The last questions: why doesn't the "time" command give me an identical
>user time each time I run the same thing? In the "time"'s report,
>is the time spent on the function calls by the system accounted in
>the user time or system time?

The user (and sys) time are also meassured with a (very similar)
statistical method as described above for the profiling (the only
difference is that no special run-time start-up module is required
as the program counter is not sampled; only the fact if the process
happens to be in "user-" or "system-mode" is noted with the time.)
The user(-mode) time accounts for the code *you* have written. You
have influence on this time by changing your code. The system time
accounts for the code the kernel spends for request from your
program(%). To change this time execute fewer system calls or make
the system calls have less work (or hire some people to rewrite
the kernal for you; as you need the kernal-source for this, you
should be prepared to have some $$ left to buy it from AT&T - I'd
say some hundredthousands should usually suffice :-)).

%: I'm aware that there is a small amount of error in this time due
to activity caused by interrupts. But this will only be of importance
in very special situations and need not be considered here.

>Thank you for your help and have a nice day.

Have a nice day too and if you can save the time, walk to the next
book store (or your campus library if you are short on money) and
look if they have copies of the following two:

	"Writing Efficient Programs" by Jon Bentley
and
	"Efficient C" by Thomas Plum and Jim Brodie

You'll not be disappointed and you can learn much from both books!
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83