Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!uunet!munnari.oz.au!cs.mu.oz.au!ok From: ok@cs.mu.oz.au (Richard O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: Non-ASCII characters, suggestion and question Message-ID: <2449@munnari.oz.au> Date: 17 Oct 89 06:02:17 GMT References: <2422@munnari.oz.au> <1067@gould.doc.ic.ac.uk> Sender: news@cs.mu.oz.au Lines: 71 In article <1067@gould.doc.ic.ac.uk>, cdsm@sappho.doc.ic.ac.uk (Chris Moss) writes: > Richard O'Keefe writes: > >Consider the problems of someone trying to write Prolog code > >which handles words in a language other than English. > Your message prompted me to look at the latest Japanes proposal that > was sent out by Roger Scowen on 2 Oct, just before the Ottawa meeting > of the ISO Prolog standardization committee. > (Richard, they sent out your comments on I/O in the same mailing) If the "comments on I/O" means the note I wrote to Roger Scowen pointing out that "current input and current output are always valid streams, no matter how files are closed" is an important invariant whose preservation ought to be explicitly demanded by the standard, that was PRIVATE MAIL not intended for publication, and distributed without my knowledge or permission. I have already taken a lot of flack from Quintus because they thought I was attacking LPA (which I wasn't, quite the opposite). Thanks to Chris Moss for posting his comments. I really don't see what is supposed to be so hard about Kanji. Quintus Prolog supported Kanji on the Xerox Lisp machines (well, it still does if anyone is supporting the hardware...) and supports Kanji under Vax/VMS and Vax/Ultrix, and may do so on other systems by now. When Quintus did that, the C standard hadn't tackled multi-octet (why OCTet? why can't I have an 18-bit character set?) characters. Now that "wide" characters ARE tackled in the C standard (wchar_t and friends), it is extremely important that whatever is decided for Prolog should not be too different from C (for the simple reason that Prolog and C programs will have to read each other's files). I suggest that the BSI/ISO committee should extract the relevant parts of the current ANSI C draft (with ANSI's permission, of course) and mail the extracts to the Prolog standard mailing list. The problem of dealing with a SINGLE character set (whether it be 7 bit, 8 bit, or 16 bit) is fairly straightforward. The problem I am concerned with is porting source code for any one Western European language between the three incompatible 8-bit character sets we already have. > 2. Collating sequence. It suggests the standard should only define an > alphabetical ordering within three groups of characters - small letters, > capital letters and digits. Anything else is based on an extended > collating sequence which is implementation defined. This is silly. Different European languages collate the same symbols differently. (Think about the Spanish rule for "ll".) If you want locale-dependent collating, you are talking about a relation between character SEQUENCES, not single characters. Since 1987 at the latest I have been saying that the Prolog standard ought to have two separate comparison predicates: compare(R, X, Y) -- as at present, where the relative order of two texts of the same type is the same as the relative order of the lists of integers representing their names collate(R, X, Y) -- locale-dependent ordering, relative order of texts is not necessarily reducible to an ordering on characters; should sort lower and upper case together, e.g. stra\:sse and STRASSE should be similar. (Yes, one of those words has 5 characters and the other 6, but they differ only in case...) The distinction is of great practical importance: to obtain fast Prolog programs in a wide range of applications we *MUST* have ***FAST*** comparison. collate/3 is likely to be slow. So setof/3 should use the fast comparison. [My postings to this group on this topic may be reproduced by anyone for any purpose.]