Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!aplcen!haven!rutgers!bellcore!att!cbnews!cblpf!mark From: mark@cblpf.ATT.COM (Mark Horton) Newsgroups: comp.std.c Subject: Re: Must sizeof(int) exceed sizeof(char) in hosted environments? Message-ID: <9487@cbnews.ATT.COM> Date: 12 Sep 89 15:34:29 GMT References: <1989Aug29.204254.3307@sq.sq.com> <1713@cbnewsl.ATT.COM> <10908@smoke.BRL.MIL> Sender: nntp@cbnews.ATT.COM Reply-To: mark@cblpf.ATT.COM (Mark Horton) Organization: AT&T Bell Laboratories, Columbus Lines: 34 In article <10908@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <1713@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes: >>At this point, I'd be happier if there were a requirement that EOF be >>distinct from all other values possible to return from fgetc! > >The very issue you discussed arose at an X3J11 meeting, in off-line >discussion with Jervis, myself, and someone else (as I recall). My >dim recollection is that we decided EOF didn't have to be distinct >if sizeof(int)==sizeof(char), and so far as we could tell the latter >is allowed. This agrees with your conclusions. I would rather >construe the description of EOF as requiring that it be distinct, >for the obvious reasons. > >Yet another matter for the "interpretations" phase? The obvious reason why you would want big characters (other than tiny 8 bit machines) is to support eastern character sets, such as the Japanese Kanji. There are several encodings of Kanji, generally in 16 bits. While they don't use the entire 65K possible combinations, they do use all 16 bits. As I recall, 7 bit ASCII, 8 bit European, and 16 bit Kanji characters can be interspersed, and can be recognized by looking at the high bits of each byte: 0/0 => ASCII, 0/1 or 1/0 => ASCII/Eur or Eur/ASCII, 1/1 => a single Kanji character in the remaining 14 bits. I suspect (but am not sure) that FFFF is unused, making EOF likely to be distinct, but it could appear in a file. I would discourage any implementation of unsigned from ignoring or clearing the high bit. I think the "assume you won't see the EOF bits in the file" approach is right for the implementation, while it's better for the application to use feof instead of EOF. By the way, some other character sets (such as Chinese) don't fit in 16 bits. Assuming that since int=long that characters will always be smaller than int may not be safe.