Newsgroups: comp.lang.c Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!sarah!bingnews!kym From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) Subject: Re: Log Library - How is it done in the library code? Message-ID: <1991Mar20.204034.28931@bingvaxu.cc.binghamton.edu> Organization: State University of New York at Binghamton References: <702@newave.UUCP> <1991Mar16.201655.6104@bingvaxu.cc.binghamton.edu> <1991Mar20.173249.3819@zoo.toronto.edu> Date: Wed, 20 Mar 1991 20:40:34 GMT In article <1991Mar20.173249.3819@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <1991Mar16.201655.6104@bingvaxu.cc.binghamton.edu> kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: >>So we see that on _some_ hardware (like 68k's) the library routines are >>at an apparent _big_ disadvantage... > >No, actually, we see that on some hardware/software combinations the library >routines are at a big disadvantage. In particular, on that Sun 3/60, did >you compile with -f68881 and use the inlining facility for the math library? >If not, you were timing the calling overhead, not the log function. No, usually I just say `-O4' and let it go at that. However, if you _wanna_ see what happens with `inlining', on a Sun 3 you get this (I was surprised): -O4 -f68881 -O4/-f68881 -fsoft -O4/-fsoft 0.356631 0.414894 0.280899 0.360215 0.355993 Apparently the `inline' option -f68881 does cut down somewhat on (presumably) the calling overhead to (essentially only) the log function. The global analysis (+ a few fancy other things that don't really apply to this program) done by -O4 is _almost_ as good as inlining (i.e. calling overhead of about (0.415-0.357)/0.415 = 14% seems to have been eliminated. However, look at the comination of -O4 and -f68881! A bit hard to understand how things can go _backwards_ for the library routine by simply doing both things. Instead of my little subroutine running only about 3 times faster, combining both switches makes it run almost 4 times faster than the library routine! Amazing! Perhaps Henry might explain this one (my brain is hurting at the moment)? As a kind of joke -- and a slight counterexample to Henry's statement above -- I tried the -fsoft option that, I presume (from the man page anyway), restricts everthing to using software fp routines. Almost the same comparison as the original -O4 result. Perhaps there _isn't_ that much variation on a given hardware, despite the various fancy compiler options. (Although I presume a DIFFERENT compiler and library on the same platform might behave quite differently). Cheers, -kym