Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: behavior of read/get0 at end_of_file
Message-ID: <860@cresswell.quintus.UUCP>
Date: 9 Apr 88 00:49:49 GMT
References: <608> <1197@kulcs.kulcs.uucp> <783@cresswell.quintus.UUCP> <522@ecrcvax.UUCP>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 54
Keywords: get0, repeat, debugging

In article <522@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes:
> Another point I want to make concerns the -1 returned by get0/1
> at eof: several people have claimed that it is portable
> and that it cannot be confused with any character, however
> it is *not* portable, since it relies on the fact that
> no valid character can be confused with -1. If characters are
> represented as strings of length 1, then -1 has a different type
> and so there is no confusion, but the eof value should have the same type
> (if nothing else then because of indexing). If characters are integers,
> taking -1 implies that no character can have the code 2^n - 1
> (n being the number of bits on which the character is stored)
> which is not necessarily true - you can use 7 bits for ASCII, 16 bits for
> Kanji and anything else on any number of bits. Only if we waste
> enough space we can guarantee that -1 will be different.
> A standard that forces you to waste space would really not be good.
> 
> --Micha

Er, which character set standards allow a character to be represented by
a negative number?  I only know about ISO 646, ASCII, EBCDIC, ISO 8859,
and XNS, and all of them define character codes to be positive integers.
Perhaps someone from Japan could comment on the JIS codings; certainly
XNS doesn't assign any Kanji a number which could be confused with a
negative integer, even in 16-bit 2s complement.  Wouldn't representing
some characters by negative numbers mean that comparison of character
codes would disagree with the collating order defined by the standard?

It is not the case that using -1 as end of file mark means that no
character can have the value 2^n-1.  All it means is that the
integer representation used by Prolog must contain at least one more
bit than the number of bits used to *store* characters.  (The whole
point of the end of file marker is that it is a value which *can't*
be stored:  it can never be a valid character in a file and it can't
appear in the name of an atom or the text of a string.)  This isn't
much of a restriction.  Even for XNS, which I believe includes all
the JIS-required Kanji, 16 bits would suffice for Prolog integers.

We don't have to waste any space at all.  For example, VM/PROLOG has
two representations for integers:  a compact one for 24-bit integers,
and another one for 32-bit integers.  Similarly, a Prolog system for
PCs using 16-bit "area" tags could have one tag for 16-bit positive
integers and another for 16-bit negative integers and a third for
bigger integers represented indirectly.  (This is what Interlisp-D does.)

I've used a programming language where the character input operation
returned one type of object for ordinary characters and another type
for end of file.  It was amazingly painful:  you always had to test
for the end of file object before doing anything with the result,
because character comparison &c were not defined on the end of file
object.  If characters are to be represented by strings of length 1
(what an utterly disgusting vomitously repulsive kludge), representing
end of file markers by the empty string seems like the obvious thing.
This representation would even make the end of file marker less than
any valid `character', which is what the -1 convention currently does.