Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!brl-smoke!smoke!gwyn@BRL.ARPA From: gwyn@BRL.ARPA (VLD/VMB) Newsgroups: net.lang.c Subject: Re: uses of void Message-ID: <3212@brl-smoke.ARPA> Date: Thu, 21-Aug-86 13:11:49 EDT Article-I.D.: brl-smok.3212 Posted: Thu Aug 21 13:11:49 1986 Date-Received: Thu, 21-Aug-86 22:21:50 EDT Sender: news@brl-smoke.ARPA Lines: 55 The idea is not to insist that all C implementations support 16-bit character codes, but to permit that as a deliberate implementation decision. I would hope that the systems I use continue to have 8-bit chars, although I do have to say that quite often I need to know more about a "letter" than just its class name ("A"). For example, these days I work a lot with typesetting, bitmap graphics, and so forth, and sometimes the size and font style of a character are about as important as its class name. Of course, it is possible to come up with any number of kludges to cope with non-traditional DP ideas of characters, and many people have done so. The desire, I think, in specifying the C language is to not have such kludges intrude into implementations where they are neither wanted nor needed. As an example of the problems, the AT&T 16-bit proposal requires strcpy() to handle "escapes", whereas what one really ought to insist on is that strcpy() handles chars as in the following simple semantic definition of its function: char *strcpy( char *dest, char *src ) { char *retval = dest; while ( (*dest++ = *src++) != '\0' ) ; return retval; } If one adopts a kludge approach, then either strcpy() can no longer be used to copy a string of text characters, or strcpy() no longer has such a simple implementation. If "char" is able to hold 16 bits, then the semantics of strcpy() can continue to be the simple model shown above, and strcpy() can copy strings of text characters (this assumes that one would always use 16 bits per character, even if the 7-bit ASCII code subset could be used for some of them). The worst example I have heard so far about international character set kludgery is the assertion that strcmp() should be useful for sorting native-language text into "dictionary order". Anyone who knows much about dictionaries (particularly oriental ones) should appreciate how na"ive that approach is. One vendor would really like to see a requirement to support in effect multiple 8-bit translation tables, because that vendor has already taken that particular approach and would have a competitive edge if it were made mandatory. I've found that most major UNIX system vendors are currently grappling with internationalization issues, and all have taken different approaches. So far they all smack of kludgery to me, although some are better than others. What concerns me is, if some simple, clean sufficient solution to the extended code set issue is not made part of the official C language specification, demands from ISO will lead to an officially-required kludge approach that will adversely impact even simple ASCII-based implementations.