Newsgroups: comp.lang.c
Path: utzoo!henry
From: henry@zoo.toronto.edu (Henry Spencer)
Subject: Re: Comparing strings... (long but maybe worth it)
Message-ID: <1990Oct14.095336.2819@zoo.toronto.edu>
Organization: U of Toronto Zoology
References: <2205.271700c2@cc.nu.oz.au> <11485@alice.att.com>
Date: Sun, 14 Oct 90 09:53:36 GMT

In article <11485@alice.att.com> ark@alice.att.com (Andrew Koenig) writes:
>	int compare (register char *p, register char *q)
>	{
>		while (*p) {
>			if (!*q || *p > *q)
>				return GREATER;
>			if (*p++ < *q++)
>				return LESS;
>		}
>		return *q? LESS: EQUAL;
>	}
>
>By this time, the program should be thoroughly incomprehensible...

Some (for example, me) would say that this was aided by your reliance on
implicit comparisons to zero, which hurt readability and do nothing for
your stated goal of greater efficiency.  Why did you bother doing that?

>I imagine that further optimizations are possible...

Definitely.  For one thing, when coding seriously-tight versions of such
things, we never make one comparison after another *inside* a loop to
decide *how* to exit it; we use the shortest possible test to decide
*whether* to exit it, so that the go-around-again path is as quick as
possible.  Examining the various tests in your loop together, the loop
continuation condition is simply `*p == *q'.  So we get (putting back
the explicit comparison while I'm at it):

	while (*p != '\0' && *p == *q)
		p++, q++;

However, we are still doing *p twice, which costs us on many machines
unless our compiler is smart enough to do common-subexpression merging.
We can do this explicitly, at the cost of declaring another (register!)
variable:

	while ((c = *p) != '\0' && c == *q)
		p++, q++;

The one obvious trick we haven't applied to this is merging the increment
operators with the fetches, as in your earlier code.  That actually can't
be done in my first loop because of problems in the after-loop code (which
I am omitting because it's 0530 and I want to wrap this up), but is
viable in the second one:

	while ((c = *p++) != '\0' && c == *q++)
		continue;

(the `continue' being present for readability, normally at no cost to
efficiency).  This is about as greased up as it's going to get without
a radical change in approach.  The after-loop code will need to check
`c' to find out which condition terminated the loop, and will need to
back up pointers and make the detailed comparisons to decide what value
to return.  (If one were tasteless enough to use a goto :-), the `c'
re-check could be avoided, but since it's outside the loop the efficiency
gain is slight.)  Completing it is left as an exercise to the reader.

A further optimization, details left as another optional exercise, is
to define LESS, EQUAL, and GREATER in terms of sign rather than precise
values, and use character subtraction to derive a suitably-signed value,
avoiding a series of tests that slow most machines down somewhat and
do bad things to pipelined machines, which really hate conditional
branches.  For extra credit, discuss how current implementations by
both Berkeley and AT&T attempt this optimization and thereby introduce
a bug (hint:  consider the fact that char is signed on most machines).

Project for the really determined reader:  I said things weren't going
to get much faster without a radical change in approach.  Find one.
(Yes, it exists.)  Discuss tradeoffs and assumptions.
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry