Path: utzoo!attcan!uunet!cs.utexas.edu!longway!std-unix From: guy@auspex.uucp (Guy Harris) Newsgroups: comp.std.unix Subject: Re: Standards Update, Recent Standards Activities Message-ID: <771@longway.TIC.COM> Date: 2 Jul 90 20:24:00 GMT References: <387@usenix.ORG> <762@longway.TIC.COM> Sender: std-unix@longway.TIC.COM Reply-To: std-unix@uunet.uu.net Organization: Auspex Systems, Santa Clara Lines: 66 Approved: jsq@longway.tic.com (Moderator, John S. Quarterman) From: guy@auspex.uucp (Guy Harris) >Both examples you supplied were simply ways to look up strings to output in >a database keyed on locale and an internal program string; they differ only >in minor ways. Does either proposal address any of the *hard* issues? For >instance, different languages have different pluralization rules; how do >you internationalize a program that automatically pluralizes when necessary >(I hate programs that say things like "1 files deleted")? Or what about >differing word order; how would you internationalize > > printf("the %s %s", adjective, noun); > >so that it would look right in a language where adjectives follow nouns? The latter can addressed by a scheme like the X/Open NLS scheme, in which "printf" arguments can be decorated by specifiers that say which of the N arguments to "*printf" following the format string should be used; the "the %s %s" would have to replace "%s %s" with "%$2s %$1s". HOWEVER: This does *NOT* do anything about the pluralization rules. It *also* does nothing about the fact that the correct translation of "the" could depend on the noun in question; i.e., is it "la" or "le" in French? I think that, for reasons such as these, the only solution to the problem of trying to find a Magic Bullet so that you can trivially internationalize the message-printing code of applications by throwing a simple-minded wrapper around "printf" (whether the #define approach, or replacing the format string with "getmsg(the format string)", or whatever) is to have software that is sufficiently knowledgable about *all* human languages supported that it knows the gender of all nouns you'll use in your messages, and knows the right articles for those genders (for all cases the language has), and knows how to pluralize arbitrary words. In fact, I'm not even sure *that's* sufficient; I only know about some Indo-European languages, and other languages may throw in problems I haven't even considered. In other words, I don't think there's a solution to the problem of "oh dear, how are we going to get all our applications modified to put out grammatically-correct messages in different languages without having to examine all the code that generates messages and possibly rewrite some of that code" other than teaching the system a fair bit about lots of human languages. I don't think you can even come up with an approach that's close enough to a solution to be interesting. I'm afraid you're just going to have to fall back on things such as: having "1 frob" and "%d frobs" be *two* separate messages in the message catalog; having "the chair" and "the table" either be two separate messages, rather than having "the %s" and foreign-language versions of same, or having the message be "%s %s" and have the database tie the noun and the article together (watch out for Russian, though, they don't *use* articles...); etc.. Yeah, this may mean human intervention, rather than being able to internationalize your messages by running just running a few programs over the code; nobody ever said that life was fair. Might as well bite the bullet.... Volume-Number: Volume 20, Number 86