Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uunet!sdrc!cinnet!icc!cbp From: cbp@icc.com (Chris Preston) Newsgroups: comp.lang.c Subject: Re: Internationalisation (was: NULL as a string terminator) Message-ID: <1990Aug24.064203.20942@icc.com> Date: 24 Aug 90 06:42:03 GMT References: <24141@megaron.cs.arizona.edu> <134@blekko.UUCP> <1881@jura.tcom.stc.co.uk> <3603@goanna.cs.rmit.oz.au> Organization: Intercomputer Communications Corp., Cincinnati, OH Lines: 231 In article <3603@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >In article <1881@jura.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes: >> In article <3585@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >> }For why? Internationalisation, _that's_ for why. > >> I cringe when I see this (unwords like "internationalisation", I mean). > >One uses language for the purpose of communication. etc., deleted. > >> Also I fail to see your point. Surely such #ifdef switching >> as above is more efficient, simpler to maintain and more legible than >> the scrabbling about with resource files you prefer? > >So now Cn James reads minds and knows what I prefer. Wonderful just. >No, it is *not* simpler to maintain. The point of the resource file >approach (not my invention by any means; no-hopers like IBM, DEC, HP, >X/Open, AT&T, Apple, ... have been using it for a while and I just >copied the idea and simplified it a bit for this newsgroup) is that >you have all the text in one place; you don't have to go "scrabbling >about" in the source files to find all the strings. You can give the >resource file to a human translator who knows nothing about the >programming language you are using. A minor addition to such a tool >(have it generate > INTEGER MSGNO > PARAMETER (MSGNO=...... >instead of #defines) will let you use the *same* message file with a >Fortran program. Speaking as a no-hoper, I must admit that using a >technique that adapts to *all* the programming languages I use, not >just C, sounds like a saving. But what do I know? Indeed, an interesting proposition. There are two immediate (I am sure the creative will have more still) ways that will work with internationalization while using labels and allow both extraction tools to work and are simple to implement and preven the repetitious use of literals and constants. Here goes: If you reaaaaaly want the text in the source section (incidentally, xscc on System V [your original example] does invoke the C preprocessor, so text substitution is absolutely not broken under MNLS, and any extractor that does not invoke the preprocessor should be considered broken) - #define DOS_DCOMM_MSG 1 #define UNIX_DCOMM_MSG 2 #define DEF_DCOMM_MSG 3 #if DOS #define DCOM_ERR_MSG DOS_DCOMM_MSG #elif UNIX #define DCOM_ERR_MSG UNIX_DCOMM_MSG #else #define DCOM_ERR_MSG DEF_DCOMM_MSG #endif #define DCOM_ERR getmsg(DCOM_ERR_MSG) /* tools.c */ char * getmsg(ErrMsg) int ErrMsg; { switch (ErrMsg){ case DOS_DCOMM_MSG: return "Run dcom.exe"; case UNIX_DCOMM_MSG: return "Datacomm not initialized, contact S/A"; case DEF_DCOMM_MSG: return "Datacomm not running"; default: return "Run for cover, they're commin' to get us"; } /* somefile.c */ int CheckDatacomm() { int RetVal; if ( (RetVal=DataCommRunning()) != 0) (void) fprintf(stderr,"%s\n",DCOM_ERR); return RetVal; } /* Makefile */ LANG = de fr sw gr neatunix: main.o somefile.o tools.o xscc -O main.o somefile.o tools.o -o neatunix @for i in $(LANG); do gencat $@.X $i.cat neatdos: main.o somefile.o tools.o xscc -O main.o somefile.o tools.o -o neatdos @dosomethingelsealtogether Another method would be to do something like the following (assuming that you are invoking the C preprocessor): #define DCOM_ERR 0 #define DRVR_ERR 1 /* etc. etc. */ char *ErrMsg[]={ #if DOS "Run dcom.com", "Run driver.com", #elif UNIX "Datacomm not initialized, contact S/A", "Driver error, contact S/A", #else "Datacomm not running", "Driver not responding", #endif }; #define MSG_ERR_DCOM ErrMsg[DCOM_ERR] #define MSG_ERR_DRVR ErrMsg[DRVR_ERR] int foo() { int Dcm, Dvr; . . . if (!Dcom()) printf("%s",MSG_ERR_DCOM); if ( SomeDriverCheck() == FAILURE) printf("%s",MSG_ERR_DRVR); . . . return somevalue_etc; } So, we have accomplished coding for purposes of internationalization, either way, we have separated string literals to a central place, and we have made the code more maintainable, since changes in messages for the environment can occure at one major juncture, and life is a cabaret. (BTW, all the above just got retyped in a max speed, so errors are surely there and to be expected, the point remains). > >As for efficiency, the point is that we are talking about a scheme for >generating messages for display to humans. The cost of fishing the text >out of a file is (or was every time I measured it) considerably less than >the cost of displaying it on the terminal. Considering the program that pays no concern for "internationalization" does not have to source anything external to it's data segment at any time other than normal operations, to say that the additional overhead is equal to or less than existing overhead is a non-sequitor. If you don't do it the cost ain't there. > >The real schemes (such as the X/Open one) identify messages by numbers, >not by address in the text file. That has the disadvantage that finding >the right text is a wee bit more complex (but not very; one need merely >attaches a directory at the end of the file), but it has the great >advantage that the program does not need to be recompiled. This means >that one customer can be running the program with messages coming from >the "English-speaking idiot" message file and another with messages >coming from the "Spanish-speaking wizard" message file, and both can be >sharing the same copy of the program without any recompilation at all. like MNLS, perhaps? > >That's the way it *is* in UNIX System V Release 4. We might as well get >used to thinking about messages in that way now. and it is not such a horrible thing. Just think, we can pop streams modules for the simple stuff, and run extractors and programs to modify the source for multibyte character sets, and use different curses libraries for right to left output. What a treasure. It has been pointed out here by several that are in the know on these things, that arguing about string literals is moot in comparison to other inherent difficulties presented by internationalization, and that the necessary crusade to "C programming practices" is long a commin'. For instance, I am told that the following is a problem in Kanji char p[10]; /* xscc provides for allowing twenty bytes as needed in Kanji */ *(p+1)='x'; /* this is the next byte, and an error */ p[n+1]='x'; /* this is the next _character_ and ok */ Given trivial differences like this, I am sure that there are many things "broken" for internationalization, and we should all prepare to cringe; however, substitution for string literals and constants is not one of them. > >> Demonstrate to me a negative impact on internationalisation (ugh) and I >> might believe you. Any negative impact will do, I'm not too choosy. > >The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS, >Ultrix), AT&T (SVR4) and others essentially add another couple of layers >of indirection above what I presented. Those systems all allow you to >switch languages at run time, without any recompilation. Those systems >all allow you to translate message files without having any other access >to the sources. They all allow many programs, and many programming >languages, to share the same message files. They all allow a customer >to substitute his own translation of a message file (perhaps amplifying >some messages, or getting the grammar right, or ...) without access to >the sources. And still can. xscc in Unix System V (your example) does all of this for you. You need not make the resource catalogues. It is done for you. > >There's four negative impacts of the #ifdef approach, just for starters. Given the above examples, do you still feel this to be the case? I do not think so. I also believe that this shows that it is an unsafe practice to say that something cannot be done within the framework of C and the C preprocessor. cbp -------- Of course these are opinions.