Newsgroups: comp.misc Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!lethe!tvcent!comspec!scocan!larryp From: larryp@sco.COM (Larry Philps) Subject: Re: International Character Sets Organization: SCO Canada, Inc. Date: Thu, 02 May 1991 12:51:58 GMT Message-ID: <1991May02.125158.20032@sco.COM> References: <1991May1.131212.8983@cbnewsl.att.com> Keywords: standards multibyte Sender: news@sco.COM (News administration) In <1991May1.131212.8983@cbnewsl.att.com> jssk@cbnewsl.att.com (jeffrey.s.skelton) writes: > Could somebody please give me pointers to standards on international > character sets? Here are the ones I know of, 1) ASCII - Nuf said 2) EBCDIC - More than enough said. 3) IBM pc850 - The standard PC character set. Very similar to ISO 8859/1 4) HP Roman8 - HP's equivalent of the above. Also very similar to ISO 8859/1. 5) ISO8859 - This is a set of 9 8-bit codesets that can handle most alphabetic languages. These are published final standards. 6) EUC - This is the Extended Unix Codeset. Characters can be 1, 2, 3 or 4 bytes in length, and can be intermixed. This is actually resonably popular, and is the base for AT&T's MNLS product. I have misplaced my reference, but I think it is ISO Standard 10664. 7) SJIS - JIS is a Japanese Information Standard, and SJIS is called Shift-JIS for some reason I have never figured out. It uses 16-bit characters to encode Kanji, but also allows single byte ASCII characters. 8) ISO 10646 - This is a proposed ISO standard for a 32 bit character set. In this character set, each "character" has a prefix that specifies which "code set" the rest of the character is an index into. Clear? For example one prefix would indicate ISO 8859/1, then the rest of the bits would be an index into that character set. 9) Unicode - This is being developed by a consortium of companies including IBM, Microsoft, Sun, and Next. It is a 16-bit character set, that tries handle all the characters for many languages by mapping identical shapes to the same position in Unicode, regardless of what the characters name is in different languages. In particular, the Chinese, Korean and Japanese symbols have been distilled down to about 18,000 unique characters (I think). I don't have a good reference for this one. Have fun. It's a brutal world out there. --- Larry Philps, SCO Canada, Inc (Formerly: HCR Corporation) Postman: 130 Bloor St. West, 10th floor, Toronto, Ontario. M5S 1N5 InterNet: larryp@sco.COM or larryp%scocan@uunet.uu.net UUCP: {uunet,utcsri,sco}!scocan!larryp Phone: (416) 922-1937 Fax: (416) 922-8397