Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!adm!lcc.rich-wiz@CS.UCLA.EDU From: lcc.rich-wiz@CS.UCLA.EDU (Richard Mathews) Newsgroups: comp.lang.c Subject: Re: String Processing Instruction Message-ID: <6972@brl-adm.ARPA> Date: Sun, 19-Apr-87 22:58:32 EST Article-I.D.: brl-adm.6972 Posted: Sun Apr 19 22:58:32 1987 Date-Received: Mon, 20-Apr-87 01:44:55 EST Sender: news@brl-adm.ARPA Lines: 38 > In article <693@jenny.cl.cam.ac.uk> am@cl.cam.ac.uk (Alan Mycroft) writes: > >You might be interested to know that such detection of null bytes in words > >can be done in 3 or 4 instructions on almost any hardware (nay even in C). > >(Code that follows relies on x being a 32 bit unsigned (or 2's complement > >int with overflow ignored)...) > > #define has_nullbyte_(x) ((x - 0x01010101) & ~x & 0x80808080) > >Then if e is an expression without side effects (e.g. variable) > > has_nullbyte_(e) > >is nonzero iff the value of e has a null byte. > I was so impressed by this new trick (well, to *me* it is new:-) > that I immedeately decided to try it. my Whitechapel MG-1, > a 32016 based machine, the results were impressive. Be careful if you use this. It does not work correctly for strings near the end of memory. Consider an (unaligned) string which is 21 bytes long and is right at the end of data space (or any data segment on machines which have such nonsense). The posted code will fault on the last read. On machines where data space is always a multiple of 4 bytes long, this method will work if you first copy enough bytes to get the source pointer to a longword boundary and then start copying 4 bytes at a time. Someone else claimed that the has_nullbyte macro above does not work correctly if it contains a byte equal to 0x80. I do not see this. Consider x==0x17801717. x - 0x01010101 == 0x167f1616 ~x == 0xe87fe8e8 (x - ONES) & ~x == 0x007f0000 has_nullbyte(x) == 0x00000000 The claim that has_nullbyte(x) will be TRUE appears to be incorrect. (I was able to prove to myself that the macro always gives the correct answer, but I won't bother to write it up and post it.) Richard M. Mathews Locus Computing Corporation lcc.richard@LOCUS.UCLA.EDU lcc.richard@UCLA-CS {ihnp4,trwrb}!lcc!richard {randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard