Path: utzoo!attcan!uunet!aplcen!haven!decuac!shlump.nac.dec.com!mountn.dec.com!minow
From: minow@mountn.dec.com (Martin Minow)
Newsgroups: comp.std.misc
Subject: Re: Int'l Character set
Summary: Many languages are standardized
Message-ID: <1667@mountn.dec.com>
Date: 12 Jun 90 16:05:13 GMT
References: <1647@mountn.dec.com> <71@lysator.liu.se>
Reply-To: minow@thundr.enet.dec.com (Martin Minow)
Organization: Digital Equipment Corporation
Lines: 53

In article <71@lysator.liu.se> aronsson@lysator.liu.se (Lars Aronsson) writes:
>
>I live in Sweden, where the national alphabet contains A-Z plus three
>"umlaut" letters. The Swedish version of the old 7-bit ISO-646 (called
>Swedish ASCII) has replaced the "[", "\", and "]" characters in ASCII
>for the national letters, as has many versions of ISO-646 in other
>European countries.  ...
>When ISO published the 8859 standard in 1989 (or 1988?), many of us
>thought this problem was solved once and for all. ...
> Northern Europe will use ISO 8859-1.

Well, not quite: for example, the Same (Lapp) languages require characters
that are not in ISO 8859-1.  This might also be the case for Irish and
Basque (I'm not completely certain.)

>
>Now, if we are lucky, all these letters are in ISO 8859-1 thru -9 (are
>there nine?). But it seems we are stuck with this switching between
>character sets that we know so well from the 7-bit era. Does the ISO
>8859 have codes that tell the equipment to set the right version or
>will we still do this by hand? Or should I learn postscript?

ISO 8859 (and allied standards) define escape sequences for switching between
character sets.  A recent Dec terminal programmer's guide (for, say, the
VT300 series) will describe them. As I recall, there are character sets
in the ISO 8859 family for the Slavic languages, including Cryllic, so, if
your terminal images the character set, you should be able to switch between
character sets with reasonable ease.  The important point about ISO 8859
is that switching need not be done for text shared between two dozen
languages, and is handled in a more coherent manner for other languages.

The Macintosh takes a somewhat different approach: all text on the Macintosh
is tagged with a font name (either implicitly or explicitly).  The font
definition contains imaging information: bitmaps for the screen and
Postscript for the printer (I'm simplifying somewhat).  There need be no
coorelation between the character code and any particular ISO/ASCII character.
For example, I developed a font for an internationally standardized symbol
font (for Orienteering) that groups symbols in a system completely different
from "ASCII."

Postscript, for that matter, uses an internal database to associate a
character code with a character name, and the name is associated with
the Postscript program that draws that shape (again, simplifying).  Thus,
if you can send Postscript directly to a printer, you can send "lower-case-
swedish-a-with-ring" to image that specific shape.

In the long-term, systems will migrate to a 32-bit character set (under
development as ISO 10646).  Here, too, escape sequences will be needed
to access subsets of the larger character set.

Martin Minow
minow@thundr.enet.dec.com
The above does not represent the position of Digital Equipment Corporation.