Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ucbcad!ucbvax!amdcad!bcase From: bcase@amdcad.UUCP Newsgroups: comp.arch Subject: Re: String Processing Instruction Message-ID: <15325@amdcad.UUCP> Date: Mon, 30-Mar-87 17:33:15 EST Article-I.D.: amdcad.15325 Posted: Mon Mar 30 17:33:15 1987 Date-Received: Thu, 2-Apr-87 00:51:20 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <909@spar.SPAR.SLB.COM> Reply-To: bcase@amdcad.UUCP (Brian Case) Distribution: na Organization: Advanced Micro Devices, Sunnyvale, California Lines: 415 In article <236@winchester.mips.UUCP> mash@winchester.UUCP (John Mashey) writes: >For example, Brian's first posting on this topic didn't mention that the >AMD was a WORD-ADDRESSED machine, that comment was later. It wouldn't >surprise me at all that the special string instruction is quite worthwhile >for such machines, even if it's probably [from the evidence I've got >so far] below the cutoff on ours [although still useful enough not to >reject without thought.] Like oops: the fact that we use the 29000 as a word-addressed machine in our simulations is so familiar to me that I forgot to mention it as an assumption. It is important to note that the 29000 can be used as either a word-addressed or as a byte-addressed machine: there are 3 "user bits" in the load/store control fields that simply drive some pins on the chip. Thus, they can be used to encode things like load/store byte/halfword/word if a system implementor so chooses. However, we think it is stupid because such things slow down memory systems and thus the entire system. But if you really want to.... >As noted: the instruction is almost certainly worth something on the AMD part. >[The word-addressing thing is a whole separate issue that's worth a LOT >of discussion, but we might as well finish this one first.] Yes, I would really like to get some opinions/facts from the net regarding the word-addess/byte-address issue. >Is it possible for you to post the code for strcmp/strlen? >(It would be especially interesting to see how you handle pointers >to bytes & shorts). Yes! Remember: This code was generated by hand tweeking the output of my C compiler! Thus, it isn't the best (Please don't hurt my feelings by complaining about the quality of my compiler. I am quite aware of its shortcomings). I am posting both the C source and the Am29000 code. Most of it should be easy to understand (but note that the extract is the 32-bits-from-64-bits funnel-shift function) (instruction format: ,, (source2 can be an 8-bit positive constant). Well, this is so damn long that I'll just post strcpy. strcmp is easy to write once you see how this works, and strlen is something we haven't bothered with yet (because nothing has needed it). Sorry in advance about the length. If you have better algorithms than the one I have implemented below, please tell me (via mail). I'll summarize to net if appropriate. strcpy (s, t) char *s, *t; { int word0, word1, word2; if (((int)s & 3) == 0) { if (((int)t & 3) == 0) { /* can do fastest */ /* has_a_zero is the compare-bytes instruction */ while (has_a_zero (word0 = *(int *)t) == 0) { *(int *)s = word0; s += 4; t += 4; } while (*s = *t) ++s, ++t; return; } /* destination is word aligned, but source isn't */ word0 = *(int *)t; while (1) { t += 4; word1 = *(int *)t; /* bit_extract is the funnel-shift instruction */ word2 = bit_extract (word0, word1, ((int)t & 3) << 3); if (has_a_zero (word2)) break; *(int *)s = word2; s += 4; word0 = word1; } t -= 4; while (*s = *t) ++s, ++t; return; } /* both strings unaligned */ if (((int)s & 3) == ((int)t & 3)) { /* both strings have the same mis-alignment */ while (1) { if ((*s = *t) == 0) return; ++s, ++t; if (((int)s & 3) == 0) break; } /* now transfer a word at a time */ while (1) { if (has_a_zero (word0 = *(int *)t)) break; *(int *)s = word0; s += 4; t += 4; } while (*s = *t) ++s, ++t; return; } /* neither string aligned and both have different alignment */ while (1) { if ((*s = *t) == 0) return; ++s, ++t; if (((int)s & 3) == 0) break; } word0 = *(int *)t; while (1) { t += 4; word1 = *(int *)t; word2 = bit_extract (word0, word1, ((int)t & 3) << 3); if (has_a_zero (word2)) break; *(int *)s = word2; s += 4; word0 = word1; } t -= 4; while (*s = *t) ++s, ++t; return; } ---------------------------------------- $L0: .use data .align .use code .global _strcpy _strcpy: sub gr01,gr01,40 asgeu OVERTRAP,gr01,gr40 add lr01,gr01,56 ;.6 "strcpy.c" and lr08,lr0c,3 eq lr08,lr08,0 jmpf lr08,$16 and lr08,lr0c,3 ;.8 "strcpy.c" and lr08,lr0d,3 eq lr08,lr08,0 jmpf lr08,$17 or lr00,lr00,0 jmp $LI1 or lr00,lr00,0 $LT19: ;.14 "strcpy.c" store 16,lr05,lr0c ;.15 "strcpy.c" add lr0c,lr0c,4 ;.16 "strcpy.c" add lr0d,lr0d,4 $LI1: ;.12 "strcpy.c" load 16,lr05,lr0d cpbyte lr02,lr05,0 neq lr08,lr02,0 jmpf lr08,$LT19 or lr00,lr00,0 $20: jmp $LI2 or lr00,lr00,0 $LT21: ;.19 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 $LI2: ;.18 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$LT21 or lr00,lr00,0 $22: add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 $17: ;.25 "strcpy.c" load 16,lr05,lr0d $LT23: ;.28 "strcpy.c" add lr0d,lr0d,4 ;.29 "strcpy.c" load 16,lr06,lr0d ;.30 "strcpy.c" and lr08,lr0d,3 lls lr04,lr08,3 mtsp FS,lr04 extract lr07,lr05,lr06 ;.31 "strcpy.c" cpbyte lr02,lr07,0 neq lr08,lr02,0 jmpf lr08,$26 or lr00,lr00,0 jmp $24 or lr00,lr00,0 $26: ;.34 "strcpy.c" store 16,lr07,lr0c ;.35 "strcpy.c" add lr0c,lr0c,4 ;.36 "strcpy.c" jmp $LT23 add lr05,lr06,0 $24: ;.38 "strcpy.c" jmp $LI3 sub lr0d,lr0d,4 $LT27: ;.40 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 $LI3: ;.39 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$LT27 or lr00,lr00,0 $28: add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 $16: ;.46 "strcpy.c" and lr09,lr0d,3 eq lr08,lr08,lr09 jmpf lr08,$29 or lr00,lr00,0 $LT30: ;.52 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$32 or lr00,lr00,0 add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 $32: ;.54 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 ;.55 "strcpy.c" and lr08,lr0c,3 eq lr08,lr08,0 jmpf lr08,$33 or lr00,lr00,0 jmp $31 or lr00,lr00,0 $33: jmp $LT30 or lr00,lr00,0 $31: $LT34: ;.62 "strcpy.c" load 16,lr05,lr0d cpbyte lr02,lr05,0 neq lr08,lr02,0 jmpf lr08,$36 or lr00,lr00,0 jmp $35 or lr00,lr00,0 $36: ;.64 "strcpy.c" store 16,lr05,lr0c ;.65 "strcpy.c" add lr0c,lr0c,4 ;.66 "strcpy.c" jmp $LT34 add lr0d,lr0d,4 $35: jmp $LI4 or lr00,lr00,0 $LT37: ;.69 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 $LI4: ;.68 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$LT37 or lr00,lr00,0 $38: add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 $29: $LT39: ;.77 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$41 or lr00,lr00,0 add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 $41: ;.79 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 ;.80 "strcpy.c" and lr08,lr0c,3 eq lr08,lr08,0 jmpf lr08,$42 or lr00,lr00,0 jmp $40 or lr00,lr00,0 $42: jmp $LT39 or lr00,lr00,0 $40: ;.83 "strcpy.c" load 16,lr05,lr0d $LT43: ;.86 "strcpy.c" add lr0d,lr0d,4 ;.87 "strcpy.c" load 16,lr06,lr0d ;.88 "strcpy.c" and lr08,lr0d,3 lls lr04,lr08,3 mtsp FS,lr04 extract lr07,lr05,lr06 ;.89 "strcpy.c" cpbyte lr02,lr07,0 neq lr08,lr02,0 jmpf lr08,$45 or lr00,lr00,0 jmp $44 or lr00,lr00,0 $45: ;.92 "strcpy.c" store 16,lr07,lr0c ;.93 "strcpy.c" add lr0c,lr0c,4 ;.94 "strcpy.c" jmp $LT43 add lr05,lr06,0 $44: ;.96 "strcpy.c" jmp $LI5 sub lr0d,lr0d,4 $LT46: ;.98 "strcpy.c" add lr0c,lr0c,1 add lr0d,lr0d,1 $LI5: ;.97 "strcpy.c" load 16,lr08,lr0d byteex lr08,lr08,0 load 16,lr09,lr0c bytein lr09,lr09,lr08 store 16,lr09,lr0c eq lr08,lr08,0 jmpf lr08,$LT46 or lr00,lr00,0 $47: add gr01,gr01,40 or lr00,lr00,0 jmpi lr00 asleu UNDERTRAP,lr01,gr41 .use data .align -----------------------------------------------------