Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!news.funet.fi!hydra!cc.helsinki.fi!wirzenius
From: wirzenius@cc.helsinki.fi (Lars Wirzenius)
Newsgroups: comp.lang.c
Subject: Re: Efficient STRing CoMPares?
Message-ID: <1991Mar16.211154.5568@cc.helsinki.fi>
Date: 16 Mar 91 21:11:54 GMT
References: <1991Mar15.142821@mars.mpr.ca>
Organization: University of Helsinki
Lines: 53

In article <1991Mar15.142821@mars.mpr.ca>, stone@mars.mpr.ca (Darren Stone) writes:
> Does anyone know if there's a better (!?) way to compare
> strings than with strcmp()?
> I'm looking for any speed advantage I can get!

For non-portability, the fastest way would probably be to write inline
assembly code using the machine's string compare instruction.  Of
course, this instruction doesn't exist in all machines, but if it does,
it's probably quite fast. 

Another way (hopefully somewhat more portable) to go is to be clever,
and to avoid string comparisons by modifying the algorithm so they
aren't needed (or are needed less frequently).  

After that you may be able to improve on the comparison method. One way
is to avoid calling strcmp if the first characters differ:

	#define STRCMP(s,t) (*(s) == *(t) ? strcmp((s), (t)) : *(s) - *(t))

(or something along those lines, this is untested). If your strings are
mostly going to differ in the first few bytes, you could pre-compute a
special value based on those bytes, and compare it first:

	#define BITS_IN_CHAR 8
	
	struct string {
		char *str;
		long value;
	};
	
	void compute_value(struct string *s) {
		int i;
	
		s->value = 0;
		for (i = 0; s->str[i] != '\0' && i < sizeof(long); ++i) {
			s->value <<= BITS_IN_CHAR;
			s->value |= s->str[i];
		}
		if (i < sizeof(long))
			s->value <<= BITS_IN_CHAR * (sizeof(long) - i);
	}

(The above could be optimized, but you get the idea.) 

If the pre-computed values differ, you know the strings differ in the
first sizeof(long) bytes, so there's no need to call strcmp. 
Furthermore, if you are careful to arrange the first character as the
most significant byte of the value (and so on), you can even find out
which string (according to the byte values, no locale collation sequence
here) is greater.  Of course, if you need the locale collation sequence,
you have to use strcmp.
-- 
Lars Wirzenius    wirzenius@cc.helsinki.fi