Xref: utzoo comp.unix.programmer:1593 comp.lang.perl:4936 comp.std.internat:855
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!bu.edu!wang!news
From: rschwartz@OFFICE.WANG.COM (R. Schwartz@Wang R&D Net)
Newsgroups: comp.unix.programmer,comp.lang.perl,comp.std.internat
Subject: Re: Tools for manipulating message catalogs
Message-ID: <b3z9dw.atc@wang.com>
Date: 16 Apr 91 15:24:35 GMT
Sender: news@wang.com
Reply-To: rschwartz@office.wang.com
Organization: Mail to News Gateway
Lines: 126

eliot@chutney.rtp.dg.com (Topher Eliot) writes:

>  (much omitted)
>
> The moral of this is that ONE SHOULDN'T DO THINGS THAT REQUIRE MAINTAINING
> SYNCHRONIZATION BETWEEN THE APPLICATION AND THE MESSAGE CATALOG, like
> inserting a new message into the middle of an existing message catalog.
>
> (more omitted)
>
> You should never WANT automatic numbering of your messages.
>
> (still more omitted)
>
> Have I made my point clear?  Would anyone care to point out flaws in my logic
> Does anyone still think that a tool to create a .h file out of a message
> catalog is useful?

      YES!!!   Your point is clear.
      YES!!!   I absoultely insist that generating .h files is required.

The flaws are not in your logic.  The flaws are in your assumptions about the
tools that should be used to synchronize code and messages when they reside
in separate files.  I.e., you presume that there are no such tools, and I
grant that it is normal for there to be none.  The dangers that you point out
are completely valid, and your point that these dangers are exacerbated by
the logistics involved in sending materials hither and yon for translation
is well taken.  But the solution isn't to make a bad software engineering
decision.  Invent the right tools instead!

The use of mnemonic names in message catalogs is an absolute necessity in
any application other than trivial toys.  Most of the benefits are too
obvious to mention.  One that bears special attention is the ability to
re-organize multiple catalogs without re-numbering.  If the run-time
organization of code changes from one release to the next, it may make
perfectly good sense to divide or merge message catalogs, or to re-locate
individual messages.  Mnemonic labels can minimize the code impact of such
changes.  I might even suggest going to enough lengths to remove the code
impact completely by adding a level of indirection so that code is unaware
which message catalog a given message comes from.

Another point that strongly supports the use of such a tool is that it
helps translators to identify their mistakes.  Comparison of the .h file
generated with the translated catalog against the version from the release
is a sure way to detect inadvertantly deleted messages and a host of other
errors.  I haven't met a translator who wouldn't love to have a way to
check for such editing errors.

Something to help us developers, too: tracking down obsolete messages is a
snap if you use a cross-referencer to find unused #defines in your generated
.h files.  Maybe it's really obsolete and should be gotten rid of since
translating obsolete messages to a dozen or so languages can cost big bucks,
pounds, marks, yen, etc.  Maybe you added an error message to the catalog
you knew you'd need it, but you forgot to code that else clause!

Am I reaching?  Am I stretching my logic to make a point?  Yup!  But does
anyone still think that a tool to create a .h file out of a message catalog
is useless?  :-)

erik@srava.sra.co.jp (Erik M. van der Poel) writes:

> Using numbers for the message ids was a bad idea in the first place.
> (Thank goodness XPG3 and AT&T's specs are not International
> Standards.)

Once compiled into an executable, no one need care what the representation
of a message id is.  Nobody says that the the #define in the generated
header ultimately has to resolve to an integer.  It merely has to resolve
to whatever the functional interface requires, and if that changes you
just change the .h generation tool.  Information hiding strikes again!

> Wouldn't it be possible to create a reasonably efficient
> implementation using hashing and caching with symbolic names instead
> of numeric ids? Then we can add/delete/modify messages at will. We
> should leave numbering and counting to the computer.

Yes it is possible, but why bother?  The organization of the run-time store
of messages can be changed for efficiency without any impact on the functional
interface.  As an example, I have implemented a (non-unix based) system that
compiles the (equivalent of) the message catalog into assembler code for a
function that retrieves the messages from (again the equivalent of) the
text segment of a shared runtime archive.  The performance is frighteningly
good, and I don't do any fancy indexing or hashing.  I could add it, but
for a large-scale multi-user application the big bang for the buck was in
reducing paging by using non-modifiable shared memory instead of data space.
Yes, it just uses integer ids, and yes, it generates the headers.

nazgul@alphalpha.com (Kee Hinckley) writes:

>                            In addition there is a way to, if not prevent
> the problem, at least spot it.  Simply have a convention, as a user
> of message catalogs, that messageId #1 is a version number.  Every
> time you make an incompatible change to the catalog, change the version
> number.  Have your application check the version number and complain
> if it doesn't match.

More than that, have it check for the last and one-past-the-last message
to verify that the catalog has exactly the right number of entries.  Don't
tolerate any errors in the message configuration -- they're just as critical
as errors in configuration of executables.  Just don't take a checksum! :-)

If you want real safety, make the versioning mechanism automatic.  Have
your make file bump it after any change that affected the .h file, and drop
the new version number into both the message cat and the .h.  Have your code
do its version check comparing the run-time version against a symbolic
constant from the very same include file!  A re-compile of the code that
includes the .h is forced anyhow, so the code is always in step with the
message catalog version.  Now, provide a modified version of the make file
for your translators that does the same checking but instead of triggering a
bump in version and re-compile (you don't give them source anyhow) it simply
triggers an error.

A final comment:

The main reason that I am concerned about this is that internationalization
of code must not violate developers' sense of what is right.  The only people
I have run into who are more fanatic than non-English speakers who (rightly)
flame against non-translatable code, are developers who (rightly) flame
against un-readable code.  There is finally real recognition of the need for
designing internationalization in applications from Day One, and this has been
a hard-fought victory.  Let's not make the software so ugly that everyone will
go back to the old attitude of "we'll worry about international in release 2".

rich schwartz   (All views expressed are my own, and not Wang Labs, Inc.'s.).
 rschwartz@office.wang.com      VOICE (508) 967 5027     FAX (508) 967 0947m.
     Wang Labs, Inc., M/S 019-58A, 1 Industrial Ave., Lowell, MA 01851