Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!caen!ox.com!math.fu-berlin.de!unido!sbsvax!cs.uni-sb.de!hubert
From: hubert@cs.uni-sb.de (Hubert Baumeister)
Newsgroups: comp.lang.smalltalk
Subject: Re: performance of IdentityDictionary in st80 r4
Message-ID: <11481@sbsvax.cs.uni-sb.de>
Date: 17 Jun 91 09:15:16 GMT
References: <HUGGINS.91Jun12092724@dino.ti.com>
Sender: news@sbsvax.cs.uni-sb.de
Distribution: comp
Organization: Max-Planck-Institut f"ur Informatik
Lines: 67

In article <HUGGINS.91Jun12092724@dino.ti.com>, huggins@ticipa.ti.com (Gray Huggins) writes:
|> Hi,
|>   We have noticed a big difference between populating an IdentityDictionary
|> with 15000 entries and one with 30000 entries.  
|> 
|>  - Why?
|>  - What can be done to improve performance?
|>  - Why do we get "a primitive has failed" when we do really big stuff without
|>    garbage collecting?
|> 
|>   Here is our test
|> 
|> 1)
|> | a |
|> a := IdentityDictionary new.
|> Time millisecondsToRun: [
|>         1 to: 15000 do: [ :i |
|>                 a at: DummyClass new put: i]]
|> 
|> Result => 11390 milliseconds
|> 
|> 2)
|> | a |
|> a := IdentityDictionary new.
|> Time millisecondsToRun: [
|>         1 to: 30000 do: [ :i |
|>                 a at: DummyClass new put: i]]
|> 
|> Result =>  1049351 milliseconds
|> 
|> Regards,
|> --
|> Gray Huggins                   Internet: huggins@ticipa.csc.ti.com
|> Texas Instruments                                          
|> PO Box 655012  M/S 3635        TI MSG:  GHUG
|> Dallas, TX 75265               Voice:   (214) 917-2202

When storing an association in an IdentityDictionary Smalltalk uses a hash function
on the key element. If there are collisions it looks for nearest free entry.
The result of the hash function defined in Object which is the one always used (except
for Characters and Smallintegers) is a 14 bit quantity. Thus the result of the
hash function is a value between 0 and 16383. If you create 15000 objects each object
gets a unique hash value. That happens in the first case. But if you create 30000 objects
then after 16384 objects the hash values repeat itself. Thus after 16384 objects there
are only collisions which are resolved by looking for the next free entry. But since
there are no 'holes' in the set of hash values, for each new entry about 7000 and more
elements have to be accessed.

A quick solution would be to redefine the method identityHash in a subclass of Object to
give a more suitable result. If you know that you will have 30000 elements in your
dictionary then modify the identityHash method of DummyClass as follows:
identityHash
	^super identityHash * 2
This will not avoid collisions after 16000 elements but in average an empty slot is found
after one search. In the example above the time increases only by a factor 3. But this
won't work for even higher numbers of elements.

Hubert

-- 
					Hubert Baumeister
					Max-Planck-Institut f"ur Informatik
					Im Stadtwald
					6600 Saarbr"ucken

					Telefon: (x49-681)302-5432
					e-mail: hubert@cs.uni-sb.de