Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!decwrl!amdcad!lll-crg!seismo!rochester!ken From: ken@rochester.UUCP (Ipse dixit) Newsgroups: net.internat Subject: Re: Int'l character sets (really sorting Chinese) Message-ID: <15711@rochester.UUCP> Date: Fri, 28-Feb-86 11:41:32 EST Article-I.D.: rocheste.15711 Posted: Fri Feb 28 11:41:32 1986 Date-Received: Sat, 1-Mar-86 22:45:35 EST References: <172@bu-cs.UUCP> <1176@enea.UUCP> <3268@sun.uucp> <1026@dcl-cs.UUCP> Reply-To: ken@rochester.UUCP (Ipse dixit) Distribution: net Organization: Sans Serif Lines: 27 In article <1026@dcl-cs.UUCP> craig@comp.lancs.ac.uk (Craig Wylie) writes: >I have seen some English Chinese dictionaries that use the number of strokes >in the character to sort. There are two common lexicographic orderings for Chinese dictionaries: 1. Sorted by major radical, then by number of strokes. The radicals are also sorted by number of strokes (I don't remember what to do for ties). Thus words to do with "wood" come before words to do with "metal" because "metal" (really "gold", the king of metals) has more strokes. 2. The Four Corner Digit method. There are 10 classes of strokes and the four corners are assigned digits corresponding to the nearest stroke. Then one uses the 4 digit number to index into the dictionary. A fifth digit is sometimes used to disambiguate. A lot like hashing. I prefer the second method because it is fast (for me, but I was regarded as a radical [pun intended] when I used a FCD dictionary in school ages ago). The disadvantage is that unrelated words will sort together. For this reason the traditional radical sort is used for phone books, etc. Ken -- UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa Snail: Comp. of Disp. Sci., U. of Roch., NY 14627. Voice: Ken!