Path: utzoo!mnetor!uunet!husc6!mit-eddie!uw-beaver!microsoft!jangr From: jangr@microsoft.UUCP (Jan Gray) Newsgroups: comp.sys.m68k Subject: Re: 68000 Tricks Message-ID: <1221@microsoft.UUCP> Date: 4 Mar 88 19:04:55 GMT References: <326@rose3.rosemount.com> <1200@microsoft.UUCP> <1629@uhccux.UUCP> Reply-To: jangr@forward.UUCP (PUT YOUR NAME HERE) Distribution: na Organization: Microsoft Corporation, Redmond, WA Lines: 66 Keywords: 68000 Tricks Speed bitcount strlen >Jan Gray's strlen seems preferable, especially since it looks right for >strings longer than 64K. Has it actually been timed on an '020? Seems >like a 7-instruction loop for every four bytes isn't ALL that much >better than a 2-instruction loop for every byte... I like the simpler >solution adapted to just compare pointers, so the loop is just: > loop: tst.b (An)+ > bne.s loop I haven't actually timed it, but here are the "from the book" timings: ; traditional, string in a0, unrolled 4 times ; 68000 68020 b/c/w (best/cache/worst) loop: movb (a0)+,d0 ; 8 4/6/7 beq end ; 8 1/4/5 movb (a0)+,d0 ; 8 4/6/7 beq end ; 8 1/4/5 movb (a0)+,d0 ; 8 4/6/7 beq end ; 8 1/4/5 movb (a0)+,d0 ; 8 4/6/7 bne loop ; 10 3/6/9 ; total for 4 chars: ; 66 22/42/50 ; identity, string in a0, must be longword aligned on 68000 ; 68000 68020 b/c/w movl #$01010101,d2 ; 12 0/6/5 movl #$80808080,d3 ; 12 0/6/5 loop: movl (a0)+,d0 ; 12 4/6/7 movl d0,d1 ; 4 0/2/3 subl d2,d0 ; 8 0/2/3 notl d1 ; 6 0/2/3 andl d1,d0 ; 8 0/2/3 andl d3,d0 ; 8 0/2/3 beq loop ; 10 3/6/9 ; total for 4 chars: ; 56 7/22/31 As you can see, byte-at-a-time is slower because it reads from memory four times more often. The speedup should be even more pronounced if your memory subsystem adds wait states. Note that if you hope to recode strlen on your typical 68020 UNIX box, you will have to longword align a0 or it may cause a segmentation fault if you strlen an unaligned string placed at the end of the last valid page of your process. If you try the analogue of this code on a 386, you will be amazed to discover that it scans and copies strings faster than the 386 string instructions themselves! (Actually, this is not quite ALWAYS true, the extra startup costs dominate for short unaligned strings). However, the string instructions are usually preferable because they cause less bus traffic. [To the person doing Othello: consider converting from a base 4 (white, black, empty, unused) representation to base 3 (white, black, empty) using lookup tables. Then many of your tables only use 3^x instead of 4^x bytes.] Jan Gray uunet!microsoft!jangr Microsoft Corp., Redmond Wash. 206-882-8080