Xref: utzoo comp.lang.smalltalk:3117 comp.object:3814
Newsgroups: comp.lang.smalltalk,comp.object
Path: utzoo!utgpu!cunews!bnrgate!bqnes74!news
From: CWatts@BNR.CA (Carl Watts)
Subject: Smalltalk scaleability & IdentityDictionaries
Message-ID: <1991Jun26.193441.28581@bqnes74.bnr.ca>
Sender: news@bqnes74.bnr.ca
Organization: Bell Northern Research
Date: Wed, 26 Jun 91 19:34:41 GMT

Gray Huggins of Texas Instruments recently sent me mail asking me
questions and expressing concerns about Smalltalk scaleability.  The
example given was the limitation of the number of elements in an
IdentityDictionary.  My response (which follows), I think, would be of
interest to other members of comp.lang.smalltalk and comp.object:

Gray,

Smalltalk scales better than other other language I know of.  The
reason is simple, you can model everything from the Space Shuttle to an
Integer the same way.  As an object (that perhaps uses other objects)
that provides an excapsulated interface to some behavior.  Now a Space
Shuttle object may have thousands of other objects that it uses to
provide its behavior, and an Integer may use no other objects to
provide its behavior, but the ways in which you treat the two objects
are pretty much the same.  They both are Objects.  And they both have a
message interface allowing interaction with it.

Smalltalk was designed from the start to not have any problems
scaling.  What other language do you now of that can as easily handle 2
+ 3 as it can evaluate:

958685745856745747465784756948574847584737384877584 +
9293458823458723745734875847574874875748874374373646

Smalltalk tells you the answer to the first is 5 and the answer to the
second is  10252144569315469493200660604523449723333611759251230

Smalltalk was the first general purpose language I know of that allowed
infinite precision Integer arithmetic.  Its thanks to the objects
Integer, LargePositiveInteger, and LargeNegativeInteger.

The IdentityDictionary problem is extremely easy to solve.  Just get
out of the mindset that the Classes that come with Smalltalk are
somehow holier-than-holy and are by definition perfect and complete.

They aren't.  Every Class in Smalltalk is implemented to fullfil a
certain purpose.  And certain implementation compromises and design
choices are always made.  When IdentityDictionary was implemented some
of the compromises and design choices that were made were:

1)  Speed over Space.  Hashing was used for fast speed at the cost of
additional space.

2)  A single object with indexable instance variables rather than some
'tree' based representation that costs more objects and where the
benefits only become apparent when a large number of items are in the
Dictionary.

The class was not meant to necessarily be the perfect implementation of
IdentityDictionary for all possible uses.

The answer to your problem should be obvious now.  Just change
IdentityDictionary or make a new kind of IdentityDictionary.  Simple.

Now, of course the question comes up, what new design compromises and
design choices do you want to make for this new class?

a)  Do you want to change the implementation of IdentityDictionary
itself to use, say a balanced tree type representation?

b)  Do you want IdentityDictionarys to turn into
LargeIdentityDictionarys when they get a certain number of elements in
them (much like a SmallInteger turns into a LargePositiveInteger when
it gets big enough).  Then the LargeIdentityDictionary can use a
different representation for storage that is more appropriate when you
have large numbers of elements.

Regardless of the first these choices, what new representation do you
want?  Multiple Arrays?  Trees of IdentityDictionarys  Self balancing
splay trees of node elements?

With each decision you are making new design decisions for your new
kind of IdentityDictionary.  Your new class will serve some particular
need.  IdentityDictionary was constructed to serve the most common need
for a class like this.  A fast, relative efficient IdentityDictionary.
It didn't promise to be all things to all people.

We chose to write a new kind of IdentityDictionary called
LargeIdentityDictionary.  IdentityDictionarys turned themselves into
LargeIdentityDictionarys when they've got move than 16000 elements in
them.  And the converse also happens, if a LargeIdentityDictionary gets
less than 8000 elements in it, it turns itself back into a normal
IdentityDictionary.

A LargeIdentityDictionary uses a different method of storing elements
which has no limit on the number of elements.  What you pay for this is
slightly slower access times.  Access time goes from O(constant) to
O(log n).  Still vary fast but slower than the hashing scheme employed
by normal IdentityDictionarys.

Hope this sheds some light.