Path: utzoo!mnetor!uunet!oddjob!hao!ames!nrl-cmf!cmcl2!brl-adm!brl-smoke!gwyn
From: gwyn@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: comp.lang.c
Subject: Re: Bug in ANSI C??
Message-ID: <7286@brl-smoke.ARPA>
Date: 19 Feb 88 00:05:43 GMT
References: <5331@cit-vax.Caltech.Edu> <241@oracle.UUCP> <2118@bsu-cs.UUCP> <16@dcs.UUCP> <10095@ulysses.homer.nj.att.com>
Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Distribution: comp.sys.ibm.pc,comp.lang.c
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 35
Keywords: memcmp, memmove, strcmp, memcmp

In article <10095@ulysses.homer.nj.att.com> cjc@ulysses.homer.nj.att.com (Chris Calabrese[rs]) writes:
>In article <16@dcs.UUCP>, wnp@dcs.UUCP writes:
>> In article <2118@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>>  >In article <241@oracle.UUCP> rbradbur@oracle.UUCP (Robert Bradbury) writes:
>>  >>On another note; does everyone realize that the current standard allows
>>  >>the results of the str/memcmp() function to be implementation defined
>>  >>if the characters being compared have the high-bit set?
>> The purpose of this would be to allow the use of the "alternate" character
>> set (= codes > 127) to be used for international language applications.
>>...
>If ansi wants this to really work, they'll have to allow for
>16 bit char's, the standard in Japanese and Chinese language
>word processors.  There is still a problem with
>using the 8th bit, as many machines generate strict parity
>for character work.

This discussion has gone onto completely the wrong track.  The reason
for allowing the indeterminacy in strcmp()'s return sign when the
differing characters have the high bit set is simply because that is
the way C "plain" chars are, so that is in fact how existing implementations
behave.

The C source characters are required to appear positive, although other
additional characters in an implementation can appear negative.  This
means that an 8-bit EBCDIC implementation would have to make "plain"
chars act like unsigned chars, for example.

The proposed ANSI C provides adequate (but minimal) support for "multi-byte
characters" such as are used in Japan.  Note that this is not the same as
16-bit chars, which are permitted but would not usually be the implementor's
choice for those environments.  (Even though it is conceptually and
practically much cleaner than explicit multi-byte sequences, they still want
to be able to handle 8-bit data too, and don't like the idea of wasted space
in an international software release when it is used in an 8-bit character
country.)