Path: utzoo!attcan!uunet!sdrc!scjones
From: scjones@sdrc.UUCP (Larry Jones)
Newsgroups: comp.lang.c
Subject: Re: Programming and international character sets.
Message-ID: <427@sdrc.UUCP>
Date: 2 Nov 88 23:04:01 GMT
References: <532@krafla.rhi.hi.is> <8804@smoke.BRL.MIL> <207@jhereg.Jhereg.MN.ORG>
Organization: Structural Dynamics Research Corp., Cincinnati
Lines: 30

In article <207@jhereg.Jhereg.MN.ORG>, mark@jhereg.Jhereg.MN.ORG (Mark H. Colburn) writes:
> [a number of misconceptions about draft ANSI C multi-byte chars:
> you can't pass them to is*() or to*(), can't tell how long they
> are, can't walk through arrays of them conveniently, etc. and
> proposes cluttering up the library with a bunch of new functions
> to handle them "correctly"]

You seem to have missed a key point in the internationalization
stuff - you don't use multi-byte characters directly, you convert
them into wchar_t's using the functions in sections 4.10.7 and
4.10.8.  wchar_t is an integral type (probably short or int) that
is large enough to hold ANY character value.

For example, the char 'A' might convert to a wchar_t value of 65
and a multi-byte sequence representing a Japaneese character would
convert to a wchar_t value of 12345.  Since wchar_t's are all the
same size, you can have an array of them that you walk through with
pointers just like you're used to doing with char arrays.

You can also pass them to the is*() and to*() functions provided
you've setlocale() to a locale that supports additional
characters.  If you look at sections 4.3 and 4.4, you will see
that they are all locale dependent.

----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                      scjones@sdrc.uucp
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150                  AT&T: (513) 576-2070
"Save the Quayles" - Mark Russell