Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!ncar!ico!rcd From: rcd@ico.isc.com (Dick Dunn) Newsgroups: comp.fonts Subject: Re: X kanji font k14 question Summary: standard "JIS" encoding Message-ID: <1990Oct31.005324.13296@ico.isc.com> Date: 31 Oct 90 00:53:24 GMT References: <29326@pasteur.Berkeley.EDU> Organization: Interactive Systems Corporation, Boulder, CO Lines: 51 shirriff@sprite.berkeley.edu (Ken Shirriff) writes: > Can anyone explain the order of the kanji characters in the X font "k14" > (JISX0208.1983 encoding)? It has several thousand kanji characters, but > as far as I can tell, they're in a random order. It's a standard encoding of a Japanese character set commonly referred to-- with horrendous over-abbreviation--as "JIS". It's actually JIS C 6226, "Code of the Japanese Graphic Character Set for Information Interchange". (BTW, this is a 2-byte code; don't confuse it with "JISCII" (6220) which is a one-byte code.) The arrangement of characters in 6226 goes something like this: punctuation (including some for vertical writing) special characters arabic numerals, 26-char Roman alphabet Hiragana Katakana Greek Cyrillic line-drawing characters two sets of Kanji In the Hiragana and Katakana, there are separate codes for all possible characters including the "small" forms and forms with the "diacritical marks". (Apologies to Japanese speakers/writers; I am trying to use terms here which will be understood by western readers.) There are two groups of Kanji in the remainder of the character set. The basis for assigning encodings is different for the two, which is possibly why it looked "random" to you. Level 1 Kanji (0x3021-4f53) contains the more common characters; it is arranged by pronunciation. Level 2 Kanji (0x5021-end) contains less common characters, arranged by their primary radicals. Again, for the western view: radicals are essentially the major stroke groups which make up the ideogram. If you look through Level 2, you'll see "lots of characters which have similar pieces" grouped together. If you think about it, the matter of a lexical ordering in a writing system using a large number of distinct symbols, instead of composing from a small alphabet, is an interesting (challenging) exercise. The actual characters in the font look as if they follow the JIS 16x16 standard bitmaps, although my weary eyes aren't up to checking that very thoroughly. I heard a mention that there may be a copyright or license problem with the k14 font, but I don't know what it is; if you are con- sidering using it, I'd look further. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...but Meatball doesn't work that way!