Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 +MULTI+2.11; site stc.UUCP Path: utzoo!linus!philabs!cmcl2!harvard!seismo!mcvax!ukc!stc!andrew From: andrew@stc.UUCP Newsgroups: net.internat Subject: Re: Alphabetical Order Message-ID: <700@stc-b.stc.UUCP> Date: Thu, 14-Nov-85 11:49:07 EST Article-I.D.: stc-b.700 Posted: Thu Nov 14 11:49:07 1985 Date-Received: Sun, 17-Nov-85 05:26:51 EST References: <125100001@ima.UUCP> <2435@sunybcs.UUCP> <787@inset.UUCP> <35@diku.UUCP> <36@diku.UUCP> <40@diku.UUCP> Reply-To: andrew@stc.UUCP (Andrew Macpherson) Organization: STC Telecoms, London N11 1HB. Lines: 36 Summary: Almost a red herring. Xpath: stc stc-b stc-b stc-a {} I think we are in danger confusing two different aspects of sorting. 1/ The simple case of ``sorting'' in whatever-my-machine-likes order for table lookup (automated binary searches, hash lookup etc) 2/ Sorting for human consumption. This is almost certainly not character set order, and may not be even remotely related (Yes even in English). Type 1 is largely irrelevant to internationalisation, except in as much as this is the type of operation carried out by *all* our general-purpose utilities, but there is little need to change these, as we are doubtless more interested in the internal efficiency than the external order (cf. dbm). The other question (type 2) is a much more involved operation. I suggest we all reach for D.E.Knuth's book ``The Art of Computer Programming'' volume 3 ``Sorting and Searching'' pp 7-9 exercise 16. This spells out the problem much better than I could (and cuts down the total news traffic). It is fairly obvious that real sorting for humans will involve sufficient heuristics to make the natural order of the internal character set immaterial. It seems likely that such sorting will have to be done on a per-language (and to a certain extent per-country) basis. This is not to say that multiple-alphabets and their internal representation is not relevant and interesting, (and I can't contribute to that discussion) merely that how such multi-lingual text sorts in simple per character comparisons is almost a red herring. -- Regards, Andrew Macpherson. {aivru,creed,datlog,iclbra,iclkid,idec,inset,root44,stl,ukc}!stc!andrew