Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!amdcad!bcase From: bcase@amdcad.UUCP Newsgroups: comp.arch,comp.lang.c Subject: String Processing Instruction Message-ID: <15292@amdcad.UUCP> Date: Wed, 25-Mar-87 14:13:49 EST Article-I.D.: amdcad.15292 Posted: Wed Mar 25 14:13:49 1987 Date-Received: Fri, 27-Mar-87 04:24:48 EST Organization: AMDCAD, Sunnyvale, CA Lines: 58 Xref: utgpu comp.arch:661 comp.lang.c:1325 There was a discussion a few months ago about processing strings more efficiently than a byte at a time. The Am29000 takes one of the possible approaches to improving string processing performance.... One unique feature of the Am29000 architecture is a special instruction. This instruction is intended to be used to speed-up string processing, but my guess is that other uses will be discovered. The instruction is called "compare-bytes" and works like this: Compare bytes specifies two source register operands and one destination register operand. The 4 pairs of corresponding bytes of the two 32-bit source operands are compared for equality (i.e., the two most-significant bytes are compared, the two next-most-significant bytes are compared, etc.). If any of the four pairs are equal, then the destination register is set to the value "TRUE" (which on the Am29000 is a one in the most-significant bit with all other bits cleared to zero). If none of the four pairs are equal, then the destination register is set to "FALSE" (all bits cleared). (Am29000 conditional branch instructions test only the most significant bit of a register, condition codes are not used; we get a free "test for negative.") So, if one of the source operands is set to all zeros (four null characters) (which can be specified in the instruction by choosing the second operand as the zero-extended eight-bit constant zero) and the other operand is a word of the character string being dealt with (say for copying or comparing), the Am29000 can, in one cycle (not counting the branch), determine if the word contains the end of string character (according to the C language definition of string). If the word does not contain the end of string character, then the four bytes in the word can be manipulated (e.g. loaded or stored) as a unit. Word operations on the Am29000 are much more efficient than character operations (this is true of most machines though). There are, of course, special circumstances to deal with (such as misaligned strings, and we have a funnel shifter to help in those cases), but by using the compare-bytes instruction in the library routines strcpy() and strcmp() (and strlen() too, but we haven't bothered since it seems to never be used in the programs we have encountered), significant improvements in the run-time of many C programs can be realized. Another thing which really helps is to have the compiler word-align literal strings (and I have implemented this), but even with word-alignment, some substrings will begin on strange boundaries and must be dealt with correctly. My approach to using this instruction consisted of re-writing the library routines in C with function calls wherever the compare-bytes instruction should go. I compiled this C code with my compiler, changed the assembly code to eliminate the function calls in favor of the compare-bytes instruction, and assembled it into the library (actually a module of code that gets included in all final links, but that is just a detail of our simple environment). Since most C programs (especially utilities and other systems programs) do a lot of string processing, this one instruction is really worth the small implementation cost. It often improves run times by 15% to 20% (just goes to show that the impact of processing C language strings has been long- ignored). It implements just the right semantics and probably has other applications for specialized pattern matching. I just thought some of you would be interested. bcase