Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!ucsd!pacbell.com!lll-winken!bert.llnl.gov!howell
From: howell@bert.llnl.gov (Louis Howell)
Newsgroups: comp.std.c++
Subject: Re: Randomly ordered fields !?!? (Was:
Message-ID: <1990Sep4.163132@bert.llnl.gov>
Date: 4 Sep 90 23:31:32 GMT
References: <1990Aug27.152540@bert.llnl.gov> <1990Aug28.211752.24905@zorch.SF-Bay.ORG> <1990Aug28.173553@bert.llnl.gov> <1990Sep1.131041.15411@zorch.SF-Bay.ORG>
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: howell@bert.llnl.gov (Louis Howell)
Organization: Lawrence Livermore National Laboratory
Lines: 170

In article <1990Sep1.131041.15411@zorch.SF-Bay.ORG>,
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
|> howell@bert.llnl.gov (Louis Howell) writes:
|> >In short, you want four types of compatibility: "comm links",
"time",
|> >"memory space", and "file storage".  First off, "time" and "file
|> >storage" look like the same thing to me,
|> 
|> Not so.  If a program compiled with compiler A stores data in a
file,
|> and a program compiled with compiler B can't extract it, that is one
|> type of compatibility problem to solve, and it can be solved with
the
|> compilers at hand.
|> 
|> But if a program compiled with compiler A revision 1.0 stores data
in
|> a file, and a program compiled with compiler A revision 4.0 cannot
|> extract it, that is a compatibility problem to solve of a different
|> type.  Mandating no standard for structure layout forces programmers
|> in both these cases to anticipate problems, unpack the data, and
store
|> it in some unstructured format.  Tough on the programmer who
realizes
|> this only when compiler A revision 4.0 can no longer read the
structures
|> written to the file with compiler A revision 1.0; it may not be
around
|> any more to allow a program to be compiled to read and rewrite that
data.

I don't want to reduce this discussion to finger-pointing and
name-calling, but I think this hypothetical programmer deserved
what he got.  I think it's a useful maxim to NEVER write anything
important in a format that you can't read.  This doesn't
necessarily mean ASCII---there's nothing wrong with storing
signed or unsigned integers, IEEE format floats, etc., in binary
form, since you can always read the data back out of the file
in these formats.  If a programmer whines because he depended on
some nebulous "standard structure format" and got burned, then I
say let him whine.  Now if there actually were a standard---IEEE,
ANSI, or whatever---then the compilers should certainly support
it.  Recent comments in this newsgroup show, however, that there
isn't even a general agreement on what a standard should look like.
Let's let the state of the art develop to that point before we
start mandating standards.

|> [...]

|> >As for "memory space",
|> >I think it reasonable that every processor in a MIMD machine,
whether
|> >shared memory or distributed memory, should use the same compiler.
|> 
|> That isn't good enough.  I've worked in shops with several million
lines
|> of code (about 7.0) in executing software.  By mandating _no_
standards
|> for structure layout, you force that _all_ of this code be recompiled
with
|> every new release of the compiler, if the paradigm of data sharing is
a
|> shared memory environment.  Again, by refusing to make one choice,
you
|> force several other choices in ways perhaps unacceptable to the
compiler
|> user.  In this situation, that might well involve several man-years
of
|> effort, and it is sure to invoke every bug in the new release of the
|> compiler simultaneously, and would very likely bring operations to a
|> standstill.  With no data structure layout standard, you have removed
the
|> user's choice to recompile and test incrementally, or else forced him
to
|> pack and unpack data even to share it in memory.

This is the only one of your arguments that I can really sympathize
with.  I've never worked directly on a project of anywhere near that
size.  As a test, however, I just timed the compilation of my own
current project.  4500 lines of C++ compiled from source to
executable in 219 seconds on a Sun 4.  Scaling linearly to 7 million
lines gives 3.41e5 seconds or about 95 hours of serial computer
time---large, but doable.  Adding in the human time required to
deal with the inevitable bugs and incompatibilities, it becomes
clear that switching compilers is a major undertaking that should
not be undertaken more often than once a year or so.

The alternative, though, dealing with a multitude of different
modules each compiled under slightly different conditions, sounds
to me like an even greater nightmare.  Imagine a code that only
works when module A is compiled with version 1.0, module B only
works under 2.3, and so on.  Much better to switch compilers very
seldom.  If you MUST work that way, though, note that you would
not expect the ordering methods to change with every incremental
release.  Changes like that would constitute a major compiler
revision, and would happen only rarely.

You can still recompile and test incrementally if you maintain
separate test suites for each significant module of the code.  If
the only test is to run a single 7 million line program and see if
it smokes, your project is doomed from the start.  (1/2 :-) )

Again, most users don't work in this type of environment.  A
monolithic code should be written in a very stable language to
minimize revisions.  (Fortran 66 comes to mind. :-)  The price is
not using the most up to date tools.  C++ just isn't old enough
yet to be very stable.  If I suggested changing the meaning of
a Fortran format statement, I'd be hung from the nearest tree,
and I'd deserve it, too.

|> [...]

|> >Finally, the issue of communication over comm links strikes me as
|> >very similar to that of file storage.  If compatibility is
essential,
|> >design the protocol yourself; don't expect the compiler to do it
for
|> >you.  Pack exactly the bits you want to send into a string of
bytes,
|> >and send that.  You wouldn't expect to send structures from a Mac
|> >to a Cray and have them mean anything, so why expect to be able to
|> >send structures from an ATT-compiled program to a GNU-compiled
|> >program?  If you want low-level compatibility, write low-level code
|> >to provide it, but don't handicap the compiler writers.
|> 
|> Same comments apply.  In a widespread worldwide network of
communicating
|> hardware, lack of a standard removes the option to send structures
intact.
|> One choice (let compiler writers have free reign for their ingenuity
in
|> packing structures for size/speed) removes another choice (let
programmers
|> have free reign for their ingenuity in accomplishing speedy and
effective
|> communications).  Somebody loses in each case, and I see the losses
on
|> the user side to far outweigh in cost and importance the losses on
the
|> compiler vendor side.

I think Stephen Spackman's suggestion of standarizing the stream
protocol, but not the internal storage management, is the proper
way to go here.

|> Then again, I write application code, not compilers, which could
|> conceivably taint my ability to make an unbiased call in this case.
;-)

Hey, I'm a user too!  I do numerical analysis and fluid mechanics.
What I do want is the best tools available for doing my job.  If
stability were a big concern I'd work in Fortran---C++ is considered
pretty radical around here.  I think the present language is a
big improvement over alternatives, but it still has a way to go.
If we clamp down on the INTERNAL details of the compiler now, we
just shut the door on possible future improvements, and the action
will move on to the the next language (D, C+=2, or whatever).  C++
just isn't old enough yet for us to put it out to pasture.

As a compromise, why don't we add to the language the option of
specifying every detail of structure layout---placement as well
as ordering.  This will satisfy users who need low-level control
over structures, without forcing every user to painfully plot
out every structure.  Just don't make it the default; most people
don't need this capability, and instead should be given the best
machine code the compiler can generate.

Louis Howell

#include <std.disclaimer>