Path: utzoo!attcan!uunet!sdrc!scjones From: scjones@sdrc.UUCP (Larry Jones) Newsgroups: comp.lang.c Subject: Re: Programming and international character sets. Message-ID: <427@sdrc.UUCP> Date: 2 Nov 88 23:04:01 GMT References: <532@krafla.rhi.hi.is> <8804@smoke.BRL.MIL> <207@jhereg.Jhereg.MN.ORG> Organization: Structural Dynamics Research Corp., Cincinnati Lines: 30 In article <207@jhereg.Jhereg.MN.ORG>, mark@jhereg.Jhereg.MN.ORG (Mark H. Colburn) writes: > [a number of misconceptions about draft ANSI C multi-byte chars: > you can't pass them to is*() or to*(), can't tell how long they > are, can't walk through arrays of them conveniently, etc. and > proposes cluttering up the library with a bunch of new functions > to handle them "correctly"] You seem to have missed a key point in the internationalization stuff - you don't use multi-byte characters directly, you convert them into wchar_t's using the functions in sections 4.10.7 and 4.10.8. wchar_t is an integral type (probably short or int) that is large enough to hold ANY character value. For example, the char 'A' might convert to a wchar_t value of 65 and a multi-byte sequence representing a Japaneese character would convert to a wchar_t value of 12345. Since wchar_t's are all the same size, you can have an array of them that you walk through with pointers just like you're used to doing with char arrays. You can also pass them to the is*() and to*() functions provided you've setlocale() to a locale that supports additional characters. If you look at sections 4.3 and 4.4, you will see that they are all locale dependent. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@sdrc.uucp 2000 Eastman Dr. BIX: ltl Milford, OH 45150 AT&T: (513) 576-2070 "Save the Quayles" - Mark Russell