Path: utzoo!news-server.csri.toronto.edu!rutgers!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased
From: chased@rbbb.Eng.Sun.COM (David Chase)
Newsgroups: comp.std.c++
Subject: Tags, typecodes, experience with these things.
Summary: Calm down a bit, please
Message-ID: <9661@exodus.Eng.Sun.COM>
Date: 12 Mar 91 19:30:05 GMT
References: <1399@culhua.prg.ox.ac.uk> <27D3E544.619A@tct.uucp> <4338@lupine.NCD.COM> <1991Mar12.123811.13701@kodak.kodak.com>
Sender: news@exodus.Eng.Sun.COM
Organization: Sun Microsystems, Mt. View, Ca.
Lines: 91

I've been trying to follow this discussion, but there's a bit too much
heat and a little too little content.  Could someone please sit down
and take the time to summarize what appear to be the two positions?

Now, a small contribution.  I implemented the C-generating back-end
for Modula-3 back when Olivetti had a research center in Menlo Park.
The language specified a "TYPECODE" for each type.  These were
implemented with substantial support from the language system.  The
major differences (for purposes of this discussion) between M-3 and
C++ are:

1) C++ has multiple inheritance, Modula-3 does not.
2) Modula-3 has more "interesting" ways to do information-hiding
   (Strictly speaking, the compiler might not know where member fields
   or functions are located.  This can be resolved at (pre-)link
   time.)
3) M-3 has no constructors or destructors (yet).
4) In M-3, ALL member functions are virtual.

Things to note:

a) If all member functions are virtual, the run-time cost for typecodes
is as small as it can get.  That information can be stored in the
virtual function table at little cost.

b) The (known) lookup algorithms for single inheritance are much
simpler and cheaper than those for multiple inheritance.  "Is x a T"
can be implemented with two comparisons.  "TYPECASE" is more complex,
but take time O(log # cases).  The data structure implementing these
things can be updated in the event of dynamically-loaded code (with
new and old types in it) in a multi-threaded system, with minimal
synchronization requirements (readers must see writes in the same
order than writers make them -- NARROW sometimes takes longer in the
event that a race is lost, but it never lies).

c) TYPECASE is NOT equivalent to "switch".

d) TYPECODES are NOT necessarily repeated from run to run.  Thus, they
are not useful by themselves for saving and restoring pickled
data.  (What if you relink in a different order?  The data should
still be good, etc.)

We used this information to help implement a "pickling" (storing data
to disk) system.  The pre-linker also generated "fingerprints" which
were in fact large (63-bit?) numbers carefully hashed from the
structure of a type, and these served to identify types from run to
run.  (We cheated, and did not also write out the type strings, so
there was a vanishingly small probability of our algorithm being
thwarted.)

At run-time, the interface went from [type or object] to typecode to
type-fingerprint.  The typecodes might vary, but the fingerprints did
not.  To read the data back in, fingerprints were mapped to typecodes
which gave access to a (low-level) interface to virtual function
tables, object size information, user-defined (*) unmarshalling code,
etc.

(*) the default marshalling and unmarshalling code was automatically
generated by a language-processing tool.  It is also possible to
perform a slower interpreted unmarshalling based on the structure of
the type.  Again, this information is stored behind an interface which
makes use of TYPECODEs.

A second use for typecodes was in implementing a verifiable type case
("NARROW") and TYPECASE.  One can argue that TYPECASE is not
object-oriented (I understand that argument, and TYPECASE clearly
becomes less necessary if you have correctly working multiple
inheritance), but NARROW appears to be necessary fairly often in C++
code (perhaps less so when templates are added, but I don't have any
experience with that, so I can't be sure).

We didn't use typecodes for garbage collection, but that was because
(i) it would have been too inefficient and (ii) we never had the time
to do that much engineering of the garbage collector.  In principle,
however, the information to help the GC could be attached to the
virtual function table along with the typecode.

----------------------------------------------------------------

In summary, TYPECODEs were fantastically useful, even if their only
use was to provide a nice name for a type in the interface to run-time
information about types.  Please consider the value of an interface to
run-time type information that actually hides some of the
implementation details.

I also think that the language system should support these features,
but that is probably a religious argument.  Certainly, if nifty
features are confined to objects with vptrs the costs can be kept low.

David Chase
Sun