Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!aplcen!haven!rutgers!bellcore!att!cbnews!cblpf!mark
From: mark@cblpf.ATT.COM (Mark Horton)
Newsgroups: comp.std.c
Subject: Re: Must sizeof(int) exceed sizeof(char) in hosted environments?
Message-ID: <9487@cbnews.ATT.COM>
Date: 12 Sep 89 15:34:29 GMT
References: <1989Aug29.204254.3307@sq.sq.com> <1713@cbnewsl.ATT.COM> <10908@smoke.BRL.MIL>
Sender: nntp@cbnews.ATT.COM
Reply-To: mark@cblpf.ATT.COM (Mark Horton)
Organization: AT&T Bell Laboratories, Columbus
Lines: 34

In article <10908@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <1713@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>>At this point, I'd be happier if there were a requirement that EOF be
>>distinct from all other values possible to return from fgetc!
>
>The very issue you discussed arose at an X3J11 meeting, in off-line
>discussion with Jervis, myself, and someone else (as I recall).  My
>dim recollection is that we decided EOF didn't have to be distinct
>if sizeof(int)==sizeof(char), and so far as we could tell the latter
>is allowed.  This agrees with your conclusions.  I would rather
>construe the description of EOF as requiring that it be distinct,
>for the obvious reasons.
>
>Yet another matter for the "interpretations" phase?

The obvious reason why you would want big characters (other than tiny
8 bit machines) is to support eastern character sets, such as the
Japanese Kanji.  There are several encodings of Kanji, generally in
16 bits.  While they don't use the entire 65K possible combinations,
they do use all 16 bits.  As I recall, 7 bit ASCII, 8 bit European,
and 16 bit Kanji characters can be interspersed, and can be recognized
by looking at the high bits of each byte: 0/0 => ASCII, 0/1 or 1/0 =>
ASCII/Eur or Eur/ASCII, 1/1 => a single Kanji character in the remaining
14 bits.  I suspect (but am not sure) that FFFF is unused, making EOF
likely to be distinct, but it could appear in a file.

I would discourage any implementation of unsigned from ignoring or clearing
the high bit.  I think the "assume you won't see the EOF bits in the file"
approach is right for the implementation, while it's better for the
application to use feof instead of EOF.

By the way, some other character sets (such as Chinese) don't fit in
16 bits.  Assuming that since int=long that characters will always be
smaller than int may not be safe.