Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!quintus!ok From: ok@quintus.UUCP (Richard A. O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: behavior of read/get0 at end_of_file Message-ID: <860@cresswell.quintus.UUCP> Date: 9 Apr 88 00:49:49 GMT References: <608> <1197@kulcs.kulcs.uucp> <783@cresswell.quintus.UUCP> <522@ecrcvax.UUCP> Organization: Quintus Computer Systems, Mountain View, CA Lines: 54 Keywords: get0, repeat, debugging In article <522@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes: > Another point I want to make concerns the -1 returned by get0/1 > at eof: several people have claimed that it is portable > and that it cannot be confused with any character, however > it is *not* portable, since it relies on the fact that > no valid character can be confused with -1. If characters are > represented as strings of length 1, then -1 has a different type > and so there is no confusion, but the eof value should have the same type > (if nothing else then because of indexing). If characters are integers, > taking -1 implies that no character can have the code 2^n - 1 > (n being the number of bits on which the character is stored) > which is not necessarily true - you can use 7 bits for ASCII, 16 bits for > Kanji and anything else on any number of bits. Only if we waste > enough space we can guarantee that -1 will be different. > A standard that forces you to waste space would really not be good. > > --Micha Er, which character set standards allow a character to be represented by a negative number? I only know about ISO 646, ASCII, EBCDIC, ISO 8859, and XNS, and all of them define character codes to be positive integers. Perhaps someone from Japan could comment on the JIS codings; certainly XNS doesn't assign any Kanji a number which could be confused with a negative integer, even in 16-bit 2s complement. Wouldn't representing some characters by negative numbers mean that comparison of character codes would disagree with the collating order defined by the standard? It is not the case that using -1 as end of file mark means that no character can have the value 2^n-1. All it means is that the integer representation used by Prolog must contain at least one more bit than the number of bits used to *store* characters. (The whole point of the end of file marker is that it is a value which *can't* be stored: it can never be a valid character in a file and it can't appear in the name of an atom or the text of a string.) This isn't much of a restriction. Even for XNS, which I believe includes all the JIS-required Kanji, 16 bits would suffice for Prolog integers. We don't have to waste any space at all. For example, VM/PROLOG has two representations for integers: a compact one for 24-bit integers, and another one for 32-bit integers. Similarly, a Prolog system for PCs using 16-bit "area" tags could have one tag for 16-bit positive integers and another for 16-bit negative integers and a third for bigger integers represented indirectly. (This is what Interlisp-D does.) I've used a programming language where the character input operation returned one type of object for ordinary characters and another type for end of file. It was amazingly painful: you always had to test for the end of file object before doing anything with the result, because character comparison &c were not defined on the end of file object. If characters are to be represented by strings of length 1 (what an utterly disgusting vomitously repulsive kludge), representing end of file markers by the empty string seems like the obvious thing. This representation would even make the end of file marker less than any valid `character', which is what the -1 convention currently does.