Xref: utzoo comp.std.c:1060 comp.std.internat:487 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!oliveb!sun!shochu!kuro From: kuro%shochu@Sun.COM (Teruhiko Kurosaka - Sun Intercon) Newsgroups: comp.std.c,comp.std.internat Subject: Hex escape for quoted multibyte character Keywords: ANSI, hexadecimal, escape, multibyte, wide character, quote Message-ID: <101058@sun.Eng.Sun.COM> Date: 25 Apr 89 18:18:40 GMT Sender: news@sun.Eng.Sun.COM Lines: 60 I have a question about relationship among three new concepts and notation introduced by ANSI-C draft: multibyte characters, wide characters, and hexadecimal escape notation. For the following discussion, let's assume a character X is a multibyte character and is represented by three byte sequnce: 0x8e 0xab 0xcd, in some system. The first question I have is how to represent this three-byte character by hexadecimal escape sequnce within double-quoted strings. The draft (12/7/88 p.30 line 14) says: The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape sequnce are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numeric value of the hexadecimal integer so formed specifies the value of the desired character or wide character. If I take this literally, it would be: char *the_multibyte_char="\x8eabcd"; /* I-1 */ However, I noticed, the draft sometimes use the word "character" and "byte" interexchangably. If the "character" actually means a byte, then char *the_multibyte_char="\x8e\xab\xcd"; /* I-2 */ must be the right notation. What I want to mean here is: char the_multibyte_char_array[]={0x8e, 0xab, 0xcd, 0}; char *the_multibyte_char=the_multibyte_char_array; Another related question is, how to use the hexadecimal escape in the wide character string ( L"..." ). Let's say, the wide character value for this character X is 0xbcde. Then, a wide character string that includes only one character X should be written as: wchar_t *the_wide_char_str=L"\xbcde"; /* II-1 */ or should it be: whcar_t *the_wide_char_str=L"\xbc\xde"; /* II-2 */ to mean: whcar_t the_wide_char_array={0xbcde, 0}; whcar_t *the_wide_char_str=the_wide_char_array; ? And finally, which is right? whcar_t the_wide_char=L'\xbcde'; /* III-1 */ whcar_t the_wide_char=L'\xbc\xde'; /* III-2 */ My personal choices are I-2, II-I and III-1. This is based on my personal belief that a hexadecimal escape sequnce should describe the value of the 'atom' element in a notation. Because a double quoted string is of type (char *), it's atom's datatype is char, which actually means a byte for historical reasons all of you know. Therfore an escape sequnce should describe a byte. For the same reason, a hexadecimal escape sequnce within a wide character constant/string-literal should describe a wide character. I would like to know what other people's think about this. In your response, please distinguesh what you think ANSI-C should have been, and what ANSI-C spec (draft) should be interpreted. Thank you in advance. -T.Kurosaka, Sun Microsystems