Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!uunet!fernwood!uupsi!sunic!isgate!krafla!aries From: aries@rhi.hi.is (Mimir Reynisson) Newsgroups: comp.sys.mac.programmer Subject: Re: How to best do a linked list on Mac Message-ID: <3288@krafla.rhi.hi.is> Date: 21 Jun 91 12:58:50 GMT References: <1CE00001.h0ibnk@tbomb.ice.com> Organization: University of Iceland Lines: 124 As has been noted "billions" of times before which method suits you best depends on what you intend to do (that is probably why there are some many different storage and retrieval methods available to-day). Anyways, here's my 0.02$ Below you'll find a brief summary of which method suits which data, my list is of course far from complete, but it should suffice to give an idea of these structures. One note before we start, I'm not really into hashing so I didn't include it here. There are quite a lot of reasons for discussing hashing, but as Clint Eastwood says: "right now I can't think of one" :) Fast access methods: Ordered Binary trees Very fast tree structure as long as the tree doesn't get too unbalanced. It the tree get's unbalanced it will be no better than a sequential search through a linked list. However, it's probably the simplest tree to implement and most of the time more than enough. Insertion and deletion are quite fast, so it should be very good for procedures that require fast insertion, deletion, and retrieval of relatively randomly entered data. Good for variable tables of simple language compilers, cross reference generators, and other tables of random size, which require fast insertion, deletion, and retrieval. 2-3 trees 2-3 trees are basically binary trees, except they can have 3 and sometimes 4 nodes instead of the usual 2. This as far as I can remember reduces the likely-hood of unbalanced trees. Haven't used this one in a long time, so I'm not quite sure. Balanced trees The complexity of the balancing operations suggests that balanced trees should be used only if infromation retrievals are considerably more frequent than insertions. This is particualarily true because the nodes of such search trees are usually implemented as densely packed records in order to economize storage. The problem is that empirical evaluations show that balanced trees lose much of their appeal if tight record packing is mandatory. It is indeed difficult to beat th straightforward, simple tree insertion algorithm :) Optimal trees Optimal search trees are hard to describe accuratly because as far as our consideration of data structure here goes, they have all based on the assumption that frequency of access is equal for all nodes, that is, all keys are equally probable to occur in a search. However, sometimes (really the exceptional cases) in which data about the probabilities of access to individual keys is available. If you have that data you should look into this structure; otherwise, this isn't a good place to look. B Trees and B+ Trees Quite an efficent data structure if the data is resonably random and most importantly that the data requires to be stored on an external storage device, like a hard-disk. It's really silly-as far as I'm concerned-to create a B Tree or a B+ Tree just for a memory resident data and not to mention horrible inefficent. B Trees are disk based data structures and quite good disk based data structure, but disk based data structures all the same. Look at 2-3 trees or binary trees if you need a memory based tree structure. Good for database index files and other disk based data. Sorted arrays Excellent for data that doesn't grow in size or if it does contains few items on the whole, because the entire array will have to be sorted or at least some part of it moved around in memory. Not ordered Static arrays Contains information that is of a fixed size or below the maximum size. Good for data that doesn't require a lot of space such as character strings, static tables, etc ... Should preferably not be used for structures that require searching unless the array has few items or the search is a rare procedure. Searching sequentially through 200 - 500 is not noticable unless the search is all the more frequent. Above 500 things start to get a little slow. Dynamic arrays Contains information that might grow or shrink over a period of time. Good for structure that don't have a lot of items, such as window record storage lists, but are added to irregularily or randomly and might cause heap fragmentation by other means. Slow access methods: Linked-lists Linked lists or linear lists are the simplest way to interrelate or link a set of elements. Insertion and deletion is fast. Searching, however, is linear except in the case of ordered list. The biggest disadvantage to this method is that it consumes a lot of pointers or handles if your each element is large. So if you have a list with few elements and each element isn't very large, a dynamic array is probably a better bet. A simple solution to solve heap fragmenation is to allocate your own heap and take care of the free-list yourself. This could be very fast if all the elements are of a fixed size and you somehow manage to resize your heap intelligently. Remember that the minimum number of a dynamically allocated block is 12 bytes, regardless of whether you use all these 12 bytes. So a linked list of integers whould be wasteful. Sets Sets are special structures that require that there be only one of each elements. They are usually a composition of several structure unless it is a bit set. I just thought I should mention them, since I was on a roll anyway. Bags (heaps) Since I mention sets I decided I should mention bags too :) Bags are sets basically, except for a minor detail. Bags are not sets, that is, they are allow multiple cases of the same key (or element). --- Hope this helps a little bit :) Mimir (aries@rhi.hi.is) Aries Software, Inc.