Path: utzoo!attcan!uunet!cs.utexas.edu!longway!std-unix
From: guy@auspex.uucp (Guy Harris)
Newsgroups: comp.std.unix
Subject: Re: Standards Update, Recent Standards Activities
Message-ID: <771@longway.TIC.COM>
Date: 2 Jul 90 20:24:00 GMT
References: <387@usenix.ORG> <762@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: std-unix@uunet.uu.net
Organization: Auspex Systems, Santa Clara
Lines: 66
Approved: jsq@longway.tic.com (Moderator, John S. Quarterman)

From:  guy@auspex.uucp (Guy Harris)

>Both examples you supplied were simply ways to look up strings to output in
>a database keyed on locale and an internal program string; they differ only
>in minor ways.  Does either proposal address any of the *hard* issues?  For
>instance, different languages have different pluralization rules; how do
>you internationalize a program that automatically pluralizes when necessary
>(I hate programs that say things like "1 files deleted")?  Or what about
>differing word order; how would you internationalize
>
>	printf("the %s %s", adjective, noun);
>
>so that it would look right in a language where adjectives follow nouns?

The latter can addressed by a scheme like the X/Open NLS scheme, in
which "printf" arguments can be decorated by specifiers that say which
of the N arguments to "*printf" following the format string should be
used; the "the %s %s" would have to replace "%s %s" with "%$2s %$1s".

HOWEVER:

This does *NOT* do anything about the pluralization rules.  It *also*
does nothing about the fact that the correct translation of "the" could
depend on the noun in question; i.e., is it "la" or "le" in French?

I think that, for reasons such as these, the only solution to the
problem of trying to find a Magic Bullet so that you can trivially
internationalize the message-printing code of applications by throwing a
simple-minded wrapper around "printf" (whether the #define approach, or
replacing the format string with "getmsg(the format string)", or
whatever) is to have software that is sufficiently knowledgable about
*all* human languages supported that it knows the gender of all nouns
you'll use in your messages, and knows the right articles for those
genders (for all cases the language has), and knows how to pluralize
arbitrary words.

In fact, I'm not even sure *that's* sufficient; I only know about some
Indo-European languages, and other languages may throw in problems I
haven't even considered.

In other words, I don't think there's a solution to the problem of "oh
dear, how are we going to get all our applications modified to put out
grammatically-correct messages in different languages without having to
examine all the code that generates messages and possibly rewrite some
of that code" other than teaching the system a fair bit about lots of
human languages.  I don't think you can even come up with an approach
that's close enough to a solution to be interesting.  I'm afraid you're
just going to have to fall back on things such as:

	having "1 frob" and "%d frobs" be *two* separate messages in the
	message catalog;

	having "the chair" and "the table" either be two separate
	messages, rather than having "the %s" and foreign-language
	versions of same, or having the message be "%s %s" and have the
	database tie the noun and the article together (watch out for
	Russian, though, they don't *use* articles...);

etc..

Yeah, this may mean human intervention, rather than being able to
internationalize your messages by running just running a few programs
over the code; nobody ever said that life was fair.  Might as well bite
the bullet.... 

Volume-Number: Volume 20, Number 86