Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!usc!rpi!clarkson!sun.soe.clarkson.edu!dean
From: dean@sun.soe.clarkson.edu (Dean Swan)
Newsgroups: comp.lang.smalltalk
Subject: Re: Smalltalk MetaClass questions.
Message-ID: <9103112121.AA14356@sun.soe.clarkson.edu>
Date: 11 Mar 91 21:21:47 GMT
Lines: 180

> >My implementation is most likely to be completely interpreted, except for a
> >minimal set of primitives, due to the nature of my storage manager, which
> >uses no object pointer table, and is not based on absolute pointers either.
> >My storage manager *does* however, support incremental garbage collection,
> >and fast object lookups without using generation scavenging, or reference
> >counts.  It also incorporates image compression automatically, so
> >redundant chunks of memory are minimized.  The virtual machine "runs" the
> >compressed image, not an uncompressed copy.
> >
> 
> A very impressive-sounding scheme.  I'd be interested in knowing how
> that works.  This would seem to be reversing the standard tradeoff
> (using space to get speed) in order to fit into limited memory, but is
> then likely to suffer from very slow speeds, both from disk accesses
> and from dealing with compressed data.
>   The scheme sounds like it would be of value even for "normal"
> machines and such technology would be a good thing to feed back into
> GNU Smalltalk.

     How it works?  Hmm. Where to start?  Ok...  My basic model for memory
is a large array of what I call 'associations'.  All objects in my system
are assigned a 32 bit unsigned number when they are created.  An 'association'
is an ordered pair of object numbers, a 'from' object and a 'to' object.
This is the heart of the system.  To build a complex object, you simply
create as many associations as you need, where the 'from' object is the
same for each association, and the 'to' object is the object number of a
component object of the complex object you are building.

     I impose a few rules to make this scheme practical.  First, memory
is a sorted array of associations sorted by 'from', then by 'to'.  The
object number FFFFFFFF (hex) is used to denote the NULL object, and it
is ignored by the sort algorithm for memory.  This way, you can delete
an object by blanking both the 'from' and 'to' fields to NULL, and leave
a hole in memory that will get cleaned up by the incremental garbage
collector.  Obviously, some other object numbers will have to be reserved
for things like primitive methods, character constants, small integers, and
the like.

     The second rule is that only the highest numbered object in the system
can grow (i.e. add more associations to the object).  This is not a big
problem since all newly created objects are the highest numbered object
when they are created, and for the few other objects that need to get bigger
dynamically, they will be 'translated' to the highest number, by creating
a new object, copying the 'to' fields of the old object into the new object
and fixing up references to the old object, then deleteing the old object
by filling it with NULLs.  To speed up the fix of references to the old
object number, every object will maintain a Dependents collection that can
be referred to ONLY by it's owner object.  This is also not a problem
Smalltalk/V does this anyway in the global dictionary 'Dependents'.

     Also, to avoid wild amounts of object tranlation, every class will have
an 'initialSize', and a 'growSize'.  For fixed size objects (which most
objects are) the 'growSize' would be 0.  For variable sized objects, the
'growSize' can be tuned based on profiles of the running system to optimize
performance.  Unused associations that are created in a 'grow' operation
will have their 'to' field set to NULL.

     The garbage collector will also incrementally translate objects down
to the lowest unused object number, so you don't run out of object numbers.

     Now regarding the disk read time problem, main memory of the target
machine will be used as a dynamic object cache.  Ideally I'd like to implement
an 'association' cache rather than an object cache, so the system can deal
with objects that are larger than main memory.  One of my potential
applications would be working with digitally sampled sounds, which could
easily be very large.

     As to the speed of dealing with a compressed image, I expect that the
compression will help rather that hurt performance.  I think I/O is the real
bottleneck on most machines, so the fewer bytes you have to deal with, the
faster it should run.

     My compression scheme is based on the idea that, for example, text is
composed of paragraphs, which are composed of sentances, which are composed
of phrases, which are composed of words, which are composed of syllables, which
are composed of syllables, which are composed of characters.  The idea is that
you build a tree of characters, where each leaf node represents a syllable,
then you build a tree of syllables where each leaf node represents a word,
then a tree of words, where each leaf is the first word of a phrase, and
so on.   To add a paragraph to a document, you just add a single 'association'
to the document object.  To compare two words, you need only compare the 'to'
field of two associations.  Since the each word will exist exactly once, if
the object numbers are identical, the two objects being compared point to the
same word.  This concept can be extrapolated to encompass any kind of complex
object.

    Comparisons for greater-than and less-than would be a little more
involved, but if you keep each level of the trees sorted, then it would be
just a comparison of two pointers again.  Keeping the trees sorted might
cost more than it's worth though.  I don't know yet.  It would make 'read'
operations much faster, but would slow down object creation quite a bit,
I suspect.

     Object lookups will be performed by a modified binary search on the
memory image.  (The array of 'associations', of course exists as a large
disk file).  Given that the average object is small, and that most objects
don't change size, I don't think that the restriction of memory space being
sorted will pose a big problem.  Most objects will be sorted as they are
created, and remain that way.

     The *MAJOR* advantage of this whole scheme is that object numbers have
no direct correlation, whatsoever, to physical addresses.  This means that
as long as you can live with the limitations of 4 billion or so objects,
the image can grow as large as disk space allows, and you don't have to
deal with lots of messy pointers that are 'married' in some way to physical
addresses.

     I don't have any real numbers yet, but if an average system has
an image size of say 8 Meg, you'll never have to probe the image more than
23 times to find the object you need, and the binary search algorithm could
be optimized by caching physical adresses of recently used objects to narrow
down the search.  I'd like to think that the real base image will be well
under 1 Meg, with the source code included, so it could concievably even
run off a high density floppy.  If the caching scheme is done right, this
could work well.

     Another thought for speed optimization would be 'expression caching'.
Numbers, for example are immutable objects, so if you perform a multiplication
of two numbers, (especially Floats or LargeIntegers), as long as you have
space in the image (i.e. free disk space) you could cache expressions and
perform lookups instead of recomputing things.  This would greatly speed
up operations like Fourier Transforms, and the like.  This expression cacheing
could all be implemented in Smalltalk, and just have the garbage collector
generate an interrupt so that Smalltalk code could purge the expression
cache when the image starts to get 'full'.

     Because of the built in compression scheme, each number that is in use
by the system will exist in the image exactly once, so expression caching
shouldn't cause the image to grow too much.  Other expression types besides
numeric computations could also be similarly cached.

> >My motivation for this project: dBase sucks.  I'm intending to use the
> >finished product to produce large information systems, based on a large,
> >persistent object space.
> 
> Don't forget hypercard/form generator like tools.  A nice class
> library for that sort of thing would be very useful.

     Definately!  Forms are very common in everyday business applications,
and my implementation of Smalltalk must include a real development
environment, and user interface tools.

> >Now, on to my question:  Could someone be so kind as to explain the workings
> >of the Behavior, Class, MetaClass hierarchy?  I've read the Blue Book, traced
> >through Smalltalk/V, and GNU Smalltalk, and I'm still thouroghly confused.
>
>   The superclass of an Object is nil, BUT the superclass of the class
> Object is Class.  Thus class methods go further up the chain than
> object, and finally reach Behavior, where 'new' is defined.  

     Ok, this makes sense.  Not obvious though, even when looking at a
running system.

> Note: Class and Metaclass have lots of subclasses, they're just
> invisible.

     How is it that they are invisible?

> What this gets you is reflectiveness.  I.E. the way that the entire
>   One might also consider that metaclasses must be objects and members
> of some class.  This could be a metametaclass, but in fact here the
> system twists around on itself and a metaclass is an instance (and a
> subclass) of Metaclass.  (I think, even I'm getting confused now).  

     So basically this is all just to avoid infinite recursion, yes?

>   Anyway, the best thing to do is avoid thinking about these things
> whenever possible.

     Agreed, but unfortunately, I have to think about it for a while, at
least until I get an initial image built and working.  I'm still a little
confused about the actual mechanism, but at least now I know why it's there.


>     Alan Knight       knight@mrco.carleton.ca                  Outlaw
>     Dept. of Mechanical and Aeronautical Engineering           Software
>     Carleton University, Ottawa, Ontario, Canada, K1S 5B6      Patents

-Dean Swan                          Independent Software Consultant
dean@sun.soe.clarkson.edu           Tel: (315) 265-2679