Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site decvax.UUCP
Path: utzoo!linus!decvax!minow
From: minow@decvax.UUCP (Martin Minow)
Newsgroups: net.lang.c,net.unix-wizards
Subject: C and national character sets
Message-ID: <60@decvax.UUCP>
Date: Wed, 29-Aug-84 21:41:27 EDT
Article-I.D.: decvax.60
Posted: Wed Aug 29 21:41:27 1984
Date-Received: Thu, 30-Aug-84 12:39:11 EDT
References: <265@diku.UUCP>
Organization: DEC UNIX Engineering Group
Lines: 34

Keld J|rn Simonsen brings up an important point concerning C
and its standardization.  (By the way, the | is the oe ligature
character, needed in the Scandinavian languages as well as
German.)  He notes that several characters used by C are
reserved by ISO standards for "national replacement characters"
The reserved characters are #@[\]^_`{|}~ -- most of which are
used in some way by C.  There isn't any really good solution --
it is highly unlikely that the C standardization committee will
remove these characters from the language.  While most of them
can be replaced by suitable #defines, several cannot, notably
backslash.  The only short-term solution would be for the
parties affected to write NRC-specific pre-processors.

In the long term, however, the problem will go away as people
move to an 8-bit character set such as Dec-Multinational or
the pending ISO standard that is almost identical to it.
In this standard, the characters in the range 0-128 are identical
to the U.S. ASCII 7-bit standard.  Characters in the range
128-159 are used for additional controls, and 160-255 for
additional graphics.

It is actually possible -- though rather messy -- to intermix
NRC's and Multinational, allowing Standard C to be written from
a terminal that normally displays a non-English NRC set.
Unfortunately, this will require a pre-processor that understands
the character-set switching escape sequences.  This could
be done as a Unix filter, of course.

Hope this helps.  Hej s} l{nge.

Martin Minow
decvax!minow