Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!uunet!munnari.oz.au!cs.mu.oz.au!ok
From: ok@cs.mu.oz.au (Richard O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: Non-ASCII characters, suggestion and question
Message-ID: <2449@munnari.oz.au>
Date: 17 Oct 89 06:02:17 GMT
References: <2422@munnari.oz.au> <1067@gould.doc.ic.ac.uk>
Sender: news@cs.mu.oz.au
Lines: 71

In article <1067@gould.doc.ic.ac.uk>, cdsm@sappho.doc.ic.ac.uk (Chris Moss) writes:
> Richard O'Keefe writes:
> >Consider the problems of someone trying to write Prolog code
> >which handles words in a language other than English.  

> Your message prompted me to look at the latest Japanes proposal that
> was sent out by Roger Scowen on 2 Oct, just before the Ottawa meeting
> of the ISO Prolog standardization committee.
> (Richard, they sent out your comments on I/O in the same mailing)

If the "comments on I/O" means the note I wrote to Roger Scowen
pointing out that "current input and current output are always
valid streams, no matter how files are closed" is an important
invariant whose preservation ought to be explicitly demanded by
the standard, that was PRIVATE MAIL not intended for publication,
and distributed without my knowledge or permission.
I have already taken a lot of flack from Quintus because they
thought I was attacking LPA (which I wasn't, quite the opposite).

Thanks to Chris Moss for posting his comments.

I really don't see what is supposed to be so hard about Kanji.
Quintus Prolog supported Kanji on the Xerox Lisp machines (well, it
still does if anyone is supporting the hardware...) and supports Kanji
under Vax/VMS and Vax/Ultrix, and may do so on other systems by now.

When Quintus did that, the C standard hadn't tackled multi-octet
(why OCTet? why can't I have an 18-bit character set?) characters.
Now that "wide" characters ARE tackled in the C standard (wchar_t
and friends), it is extremely important that whatever is decided
for Prolog should not be too different from C (for the simple reason
that Prolog and C programs will have to read each other's files).
I suggest that the BSI/ISO committee should extract the relevant
parts of the current ANSI C draft (with ANSI's permission, of course)
and mail the extracts to the Prolog standard mailing list.

The problem of dealing with a SINGLE character set (whether it be 7 bit,
8 bit, or 16 bit) is fairly straightforward.  The problem I am concerned
with is porting source code for any one Western European language between
the three incompatible 8-bit character sets we already have.

> 2. Collating sequence. It suggests the standard should only define an
> alphabetical ordering within three groups of characters - small letters,
> capital letters and digits. Anything else is based on an extended
> collating sequence which is implementation defined.

This is silly.  Different European languages collate the same symbols
differently.  (Think about the Spanish rule for "ll".)  If you want
locale-dependent collating, you are talking about a relation between
character SEQUENCES, not single characters.  Since 1987 at the latest
I have been saying that the Prolog standard ought to have two separate
comparison predicates:
	compare(R, X, Y)
		-- as at present, where the relative order of two texts
	 	   of the same type is the same as the relative order of
		   the lists of integers representing their names
	collate(R, X, Y)
		-- locale-dependent ordering, relative order of texts is
		   not necessarily reducible to an ordering on characters;
		   should sort lower and upper case together, e.g.
		   stra\:sse and STRASSE should be similar.  (Yes, one of
		   those words has 5 characters and the other 6, but they
		   differ only in case...)

The distinction is of great practical importance:  to obtain fast Prolog
programs in a wide range of applications we *MUST* have ***FAST***
comparison.  collate/3 is likely to be slow.  So setof/3 should use the
fast comparison.

[My postings to this group on this topic may be reproduced by anyone for
 any purpose.]