Path: utzoo!attcan!uunet!aplcen!haven!decuac!shlump.nac.dec.com!mountn.dec.com!minow From: minow@mountn.dec.com (Martin Minow) Newsgroups: comp.std.misc Subject: Re: Int'l Character set Summary: Many languages are standardized Message-ID: <1667@mountn.dec.com> Date: 12 Jun 90 16:05:13 GMT References: <1647@mountn.dec.com> <71@lysator.liu.se> Reply-To: minow@thundr.enet.dec.com (Martin Minow) Organization: Digital Equipment Corporation Lines: 53 In article <71@lysator.liu.se> aronsson@lysator.liu.se (Lars Aronsson) writes: > >I live in Sweden, where the national alphabet contains A-Z plus three >"umlaut" letters. The Swedish version of the old 7-bit ISO-646 (called >Swedish ASCII) has replaced the "[", "\", and "]" characters in ASCII >for the national letters, as has many versions of ISO-646 in other >European countries. ... >When ISO published the 8859 standard in 1989 (or 1988?), many of us >thought this problem was solved once and for all. ... > Northern Europe will use ISO 8859-1. Well, not quite: for example, the Same (Lapp) languages require characters that are not in ISO 8859-1. This might also be the case for Irish and Basque (I'm not completely certain.) > >Now, if we are lucky, all these letters are in ISO 8859-1 thru -9 (are >there nine?). But it seems we are stuck with this switching between >character sets that we know so well from the 7-bit era. Does the ISO >8859 have codes that tell the equipment to set the right version or >will we still do this by hand? Or should I learn postscript? ISO 8859 (and allied standards) define escape sequences for switching between character sets. A recent Dec terminal programmer's guide (for, say, the VT300 series) will describe them. As I recall, there are character sets in the ISO 8859 family for the Slavic languages, including Cryllic, so, if your terminal images the character set, you should be able to switch between character sets with reasonable ease. The important point about ISO 8859 is that switching need not be done for text shared between two dozen languages, and is handled in a more coherent manner for other languages. The Macintosh takes a somewhat different approach: all text on the Macintosh is tagged with a font name (either implicitly or explicitly). The font definition contains imaging information: bitmaps for the screen and Postscript for the printer (I'm simplifying somewhat). There need be no coorelation between the character code and any particular ISO/ASCII character. For example, I developed a font for an internationally standardized symbol font (for Orienteering) that groups symbols in a system completely different from "ASCII." Postscript, for that matter, uses an internal database to associate a character code with a character name, and the name is associated with the Postscript program that draws that shape (again, simplifying). Thus, if you can send Postscript directly to a printer, you can send "lower-case- swedish-a-with-ring" to image that specific shape. In the long-term, systems will migrate to a 32-bit character set (under development as ISO 10646). Here, too, escape sequences will be needed to access subsets of the larger character set. Martin Minow minow@thundr.enet.dec.com The above does not represent the position of Digital Equipment Corporation.