Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!caen!ox.com!math.fu-berlin.de!unido!sbsvax!cs.uni-sb.de!hubert From: hubert@cs.uni-sb.de (Hubert Baumeister) Newsgroups: comp.lang.smalltalk Subject: Re: performance of IdentityDictionary in st80 r4 Message-ID: <11481@sbsvax.cs.uni-sb.de> Date: 17 Jun 91 09:15:16 GMT References: Sender: news@sbsvax.cs.uni-sb.de Distribution: comp Organization: Max-Planck-Institut f"ur Informatik Lines: 67 In article , huggins@ticipa.ti.com (Gray Huggins) writes: |> Hi, |> We have noticed a big difference between populating an IdentityDictionary |> with 15000 entries and one with 30000 entries. |> |> - Why? |> - What can be done to improve performance? |> - Why do we get "a primitive has failed" when we do really big stuff without |> garbage collecting? |> |> Here is our test |> |> 1) |> | a | |> a := IdentityDictionary new. |> Time millisecondsToRun: [ |> 1 to: 15000 do: [ :i | |> a at: DummyClass new put: i]] |> |> Result => 11390 milliseconds |> |> 2) |> | a | |> a := IdentityDictionary new. |> Time millisecondsToRun: [ |> 1 to: 30000 do: [ :i | |> a at: DummyClass new put: i]] |> |> Result => 1049351 milliseconds |> |> Regards, |> -- |> Gray Huggins Internet: huggins@ticipa.csc.ti.com |> Texas Instruments |> PO Box 655012 M/S 3635 TI MSG: GHUG |> Dallas, TX 75265 Voice: (214) 917-2202 When storing an association in an IdentityDictionary Smalltalk uses a hash function on the key element. If there are collisions it looks for nearest free entry. The result of the hash function defined in Object which is the one always used (except for Characters and Smallintegers) is a 14 bit quantity. Thus the result of the hash function is a value between 0 and 16383. If you create 15000 objects each object gets a unique hash value. That happens in the first case. But if you create 30000 objects then after 16384 objects the hash values repeat itself. Thus after 16384 objects there are only collisions which are resolved by looking for the next free entry. But since there are no 'holes' in the set of hash values, for each new entry about 7000 and more elements have to be accessed. A quick solution would be to redefine the method identityHash in a subclass of Object to give a more suitable result. If you know that you will have 30000 elements in your dictionary then modify the identityHash method of DummyClass as follows: identityHash ^super identityHash * 2 This will not avoid collisions after 16000 elements but in average an empty slot is found after one search. In the example above the time increases only by a factor 3. But this won't work for even higher numbers of elements. Hubert -- Hubert Baumeister Max-Planck-Institut f"ur Informatik Im Stadtwald 6600 Saarbr"ucken Telefon: (x49-681)302-5432 e-mail: hubert@cs.uni-sb.de