Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!usc!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!dkuug!dkuugin!keld From: keld@login.dkuug.dk (Keld J|rn Simonsen) Newsgroups: comp.std.c Subject: Re: wchar_t values Message-ID: Date: 31 Mar 91 16:22:14 GMT References: <990@sranha.sra.co.jp> <1006@sranha.sra.co.jp> Sender: news@slyrf.dkuug.dk Lines: 66 erik@srava.sra.co.jp (Erik M. van der Poel) writes: >As several people have guessed, the real reason for bringing up the >wchar_t issue is because I am wondering how ISO 10646 can be used in >the C language. Personally, I think that we should use it as follows: > C ISO DIS 10646/4 wchar_t > L'c' 032/032/032/099 000/000/000/099 > L'\t' 009/128/128/128 000/000/000/009 >I think that this is the most reasonable way to do it since it seems >to conform to ANSI C. Erik writes: ANSI C does not handle 10646 properly -> let's change 10646! I do not think this is the right way of reasoning. ANSI C does not handle DIS 10646, JIS X 0208, GB 2312 and KSC 5601 correctly. So ANSI C multibyte specifications *cannot* be used on any multibyte de jure character set. Seems to me to be a fault with ANSI C. Also the character standards should be the base standards and programming language standards build on these and provide appropiate functionality to cover the standard character sets. If then another programming language or maybe some communication standard have other requirements for a universal character standard, should character standard then also be changed to accomodate that use? And what if the different requirements are contradictionary, should that lead to different character set standards? Well, that was what happened in the past, with the ISO 646 and 8859 standards in programming languages and 6937/T.61 in the communications world. I hope that this problem will be a historical one with the appearance of 10646. >However, I don't really care what encoding we use for wchar_t, as long >as implementors who wish to use 10646 for wchar_t all agree on one >encoding. So we should create an international standard the specifies >how to use 10646 as a processing code in C. If this spec appears some >time after 10646 becomes an IS, implementors might do things >differently. So the spec should appear together with 10646. Perhaps in >a normative annex in 10646? It could also appear in the ISO C addendum that is being worked on by WG14. I think that is the most natural place, 10646 should not as a base standard for other JTC1 work reference the ISO C standard. I have some ideas on how to solve it in C: 1. include a table for mapping ASCII characters into the current execution character set in the runtime library. This table is changed with a new call to setlocale(). L'c' then points to the table entry of ASCII 'c' with the current wchar_t 'c' value. Effectivenes: quite good, just a pointed value instead of an immediate value. For widechar characters this may even be without any loss as the widechar value may have to be stored in a 2 or 4-byte location anyway. 2. Have a function which returns a character from a charmap name (POSIX term). This will have the generality that not only ASCII characters can be handled in this way. Say a character (C-cedille) can also be tested on in this way. Effectivenes: less good, needs a function call and a table lookup on a name (hashed or the like). Maybe we should have both ways of handling the identity of widechars. Keld Simonsen