Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!munnari.oz.au!goanna!ok From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.lang.c Subject: Re: Internationalisation (was: NULL as a string terminator) Message-ID: <3617@goanna.cs.rmit.oz.au> Date: 26 Aug 90 11:38:23 GMT References: <24141@megaron.cs.arizona.edu> <134@blekko.UUCP> <1990Aug24.064203.20942@icc.com> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 132 In article <1990Aug24.064203.20942@icc.com>, cbp@icc.com (Chris Preston) writes: > If you reaaaaaly want the text in the source section (incidentally, xscc on > System V [your original example] does invoke the C preprocessor No, xscc was *not* my example nor anyone else's in this thread before this. I mentioned System V Release 4, to be sure, but I did not mention xscc. How on earth is using xscc supposed to help me use the same message file for C, Pascal, Fortran, and Lisp? > so text substitution is absolutely not broken under MNLS Whoever said it was? > Another method would be to do something like the following (assuming that > you are invoking the C preprocessor): > #define DCOM_ERR 0 > #define DRVR_ERR 1 /* etc. etc. */ > char *ErrMsg[]={ > #if DOS > "Run dcom.com", > "Run driver.com", > #elif UNIX > "Datacomm not initialized, contact S/A", > "Driver error, contact S/A", > #else > "Datacomm not running", > "Driver not responding", > #endif > }; Again, this technique means that you need the sources, and that to change the messages you need access to the sources and to recompile. That was an objection validly raised against the stripped-down message file technique I posted, and it applies with greater force to this. > So, we have accomplished coding for purposes of internationalization, > either way, we have separated string literals to a central place, > and we have made the code more maintainable, since changes in messages for > the environment can occure at one major juncture, and life is a cabaret. The point of a message file is that -- the "central place" is OUTSIDE THE PROGRAM -- a message file can be got at by someone with no (other) access to sources (this is a *big* deal for developers!) -- *one* version of the object file can be shared by people using *different* message files. > >As for efficiency, the point is that we are talking about a scheme for > >generating messages for display to humans. The cost of fishing the text > >out of a file is (or was every time I measured it) considerably less than > >the cost of displaying it on the terminal. > > Considering the program that pays no concern for "internationalization" > does not have to source anything external to it's data segment at any > time other than normal operations, to say that the additional overhead is > equal to or less than existing overhead is a non-sequitor. If you > don't do it the cost ain't there. That's non-sequitUr, and this "rebuttal" is badly flawed. What I claimed was (cost of fetching message) << (cost of displaying message) Someone with measurements to disprove this can refute me (for a particular hardware/software combination) by displaying his figures. Of course, what is *really* interesting about this "rebuttal" is that in a virtual memory environment it simply isn't true. We're talking about messages here, things which are displayed at relatively infrequent (we hope!) intervals. Text, in short, which is paged OUT. In a system which supports memory- mapped files (VMS, Aegis, SunOS 4.x, AIX, ...) one could open the message file as a memory-mapped file, and then the process of fetching a message from the message file would cost no more than the process of fetching a message from a pre-initialised character array, because the two would be exactly the same process. > It has been pointed out here by several that are in the know on these > things, that arguing about string literals is moot in comparison to other > inherent difficulties presented by internationalization, and that the > necessary crusade to "C programming practices" is long a commin'. That is why, for example, ANSI C has wchar_t wcstombs() mbstowcs() mblen() and so on, and why it is set up to allow multi-byte characters in constants. > >There's four negative impacts of the #ifdef approach, just for starters. > Given the above examples, do you still feel this to be the case? Of course. Those four negative impacts still stand. > I do not think so. I also believe that this shows that it is an unsafe > practice to say that something cannot be done within the framework of C > and the C preprocessor. Again, who said _that_? Not me! That there are *better* ways to do some things than using the C preprocessor, who can challenge that? The only question is, _which_ tasks? Given that I said I would like to share message files between several programming languages, using a facility peculiar to one of them (there is no guarantee that /usr/lib/cpp will be available nor anything like it) would be rather silly, wouldn't it? A serious problem concerned with "the need to make the texts we write for the tools that count work with more than one tongue of men" (otherwise known as "internationalisation" if you have no fear of words that have more than one sound in them) is that C formats don't quite work. One common problem is that different languages put phrases in different orders. The X/Open answer to that is to have an extra piece of information in %format controls, saying which argument to use. I presume that the ANSI C committee considered that, and didn't include it because it basically needs pointers and integers to be the same size. The following suggestion is not altogether serious. But bearing in mind things like wanting to put phrases in different orders, and all sorts of things one might like to let customers configure for themselves (without having to give them *all* the sources), it might not be as crazy as it sounds. How about using TCL (Tool Command Language) for "messages"? TCL is a free "extension language" which somewhat resembles the Unix shells, and is set up to be a *small* library that can be linked into C code. When one wants to report an event, one could format the arguments of that event into strings, fetch a TCL command from a file, and execute that TCL command. It was intended to customise input to things like the editor "mx", but there's no reason it couldn't be used to customise *output*. As I say, not altogether serious. -- The taxonomy of Pleistocene equids is in a state of confusion.