Path: utzoo!dciem!nrcaer!sce!cognos!jimp From: jimp@cognos.uucp (Jim Patterson) Newsgroups: comp.arch Subject: Re: Endian wars Message-ID: <5124@aldebaran.UUCP> Date: 25 Jan 89 17:24:36 GMT Article-I.D.: aldebara.5124 References: <6133@columbia.edu> Reply-To: jimp@cognos.UUCP (Jim Patterson) Organization: Cognos Inc., Ottawa, Canada Lines: 64 In article <6133@columbia.edu> eppstein@garfield (David Eppstein) writes: >Big endian lets you use integer comparison instructions to do string >compares a word at a time. Little endian means you are stuck with a >byte at a time. I'd like to see an algorithm that actually benefits from this. Consider... If you know how long the string is ahead of time, you can optimize the first (n / 4) int's (assuming 4-byte ints) whether or not it's big-endian. If it's big-endian, then the first non-equal match indicates the result. If it's little-endian, you have to switch to a byte-wise loop for the non-equal word which means up to four bytes are checked twice. In the C library, this approach only applies to memcmp. You have to treat the last word specially if it's not an even multiple of the word size with the big-endian approach. Otherwise, you will get the wrong answer if the portion that shouldn't be matched is the only part that is different. The little-endian approach already checks any non-matching word, so if designed right this would not be a special case. If you want to do signed-byte comparisons, the big-endian word-oriented approach won't work (you will have to do it similar to the little-endian approach). The reason is that the sign of the result will indicate the relative ordering of the int's, and not of the bytes that mismatch. If you don't know how long the string is (as for strcmp and strncmp), then you have to scan the string to find how long it is. For a word-oriented algorithm to be effective here, you need an algorithm which detects which byte of a word (if any) contains a NUL. I contend that there's no simple way to do this with integer instructions; it's more effective to use byte-oriented instructions. The only word-oriented approaches I can think of are along these lines. has_NUL = ! (i & 0xff && i & 0xff00 && i & 0xff0000 && i & 0xff000000); If you have to scan the string as bytes anyways, I think that it would be more efficient to compare them with the other string at the same time. Maybe someone has a better algorithm. This one is sure to be worse than using byte-oriented instructions, unless your machine just doesn't have any or they are woefully inadequate. A straight table-lookup is obviously out of the question. In summary, a big-endian machine might gain a slight advantage if it knows ahead of time how long the string is. Really, the only difference is that the big-endian machine can run along some fixed number of words, and return the result from the first non-equal match, whereas the little-endian algorithm has to double-check the unmatching word with byte instructionsbefore it can return. For varying-length strings, I see no advantage to the word-oriented approach. If anyone has any good word-oriented implementations of memcmp or especially strcmp, I'd be interested in seeing them. -- Jim Patterson Cognos Incorporated UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707 PHONE:(613)738-1440 3755 Riverside Drive Ottawa, Ont K1G 3Z4