Path: utzoo!mnetor!uunet!oddjob!hao!ames!nrl-cmf!cmcl2!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: Bug in ANSI C?? Message-ID: <7286@brl-smoke.ARPA> Date: 19 Feb 88 00:05:43 GMT References: <5331@cit-vax.Caltech.Edu> <241@oracle.UUCP> <2118@bsu-cs.UUCP> <16@dcs.UUCP> <10095@ulysses.homer.nj.att.com> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Distribution: comp.sys.ibm.pc,comp.lang.c Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 35 Keywords: memcmp, memmove, strcmp, memcmp In article <10095@ulysses.homer.nj.att.com> cjc@ulysses.homer.nj.att.com (Chris Calabrese[rs]) writes: >In article <16@dcs.UUCP>, wnp@dcs.UUCP writes: >> In article <2118@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >> >In article <241@oracle.UUCP> rbradbur@oracle.UUCP (Robert Bradbury) writes: >> >>On another note; does everyone realize that the current standard allows >> >>the results of the str/memcmp() function to be implementation defined >> >>if the characters being compared have the high-bit set? >> The purpose of this would be to allow the use of the "alternate" character >> set (= codes > 127) to be used for international language applications. >>... >If ansi wants this to really work, they'll have to allow for >16 bit char's, the standard in Japanese and Chinese language >word processors. There is still a problem with >using the 8th bit, as many machines generate strict parity >for character work. This discussion has gone onto completely the wrong track. The reason for allowing the indeterminacy in strcmp()'s return sign when the differing characters have the high bit set is simply because that is the way C "plain" chars are, so that is in fact how existing implementations behave. The C source characters are required to appear positive, although other additional characters in an implementation can appear negative. This means that an 8-bit EBCDIC implementation would have to make "plain" chars act like unsigned chars, for example. The proposed ANSI C provides adequate (but minimal) support for "multi-byte characters" such as are used in Japan. Note that this is not the same as 16-bit chars, which are permitted but would not usually be the implementor's choice for those environments. (Even though it is conceptually and practically much cleaner than explicit multi-byte sequences, they still want to be able to handle 8-bit data too, and don't like the idea of wasted space in an international software release when it is used in an 8-bit character country.)