Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!samsung!noose.ecn.purdue.edu!mentor.cc.purdue.edu!ajz From: ajz@mentor.cc.purdue.edu (T. Tim Hsu) Newsgroups: alt.sources Subject: Re: Sorting Algorithm Keywords: Sort Sorting C Source Linked-Lists Message-ID: <13245@mentor.cc.purdue.edu> Date: 23 Aug 90 07:47:17 GMT References: <15890@oolong.la.locus.com> Reply-To: ajz@mentor.cc.purdue.edu (T. Tim Hsu) Distribution: na Organization: Purdue University Lines: 62 Having recently taken a course where a 3 week project was based upon sorting, here are my notes... Here's the background, the sort was performed on four 9-digit integers where a seperate file told which of those 4 integers was the most significant key and which was the least. The total time spent in the program had to be less than 4 seconds for a 4000 item list (ours was about 1 second and although .15 seconds could have shaved off of it [ie, by killing all the error checking routines], I saw one that did it in .5 seconds). The first thing you should know is that any textbook routine like quicksort, mergesort, or even a radix sort failed to sort a random 4000 item list in under 4 seconds on our Gould NP-1. This is what my team learned... 1) Traversing a linked list to locate an item is very time consuming. 2) Dumb O(n^2) routines have very little overhead making them MUCH FASTER than O(n log n) routines (which have high overheads) for small values of n (<15). 3) An insertion sort [O(n^2)] is one of the fastest sorts for nearly sorted data especially when the unsorted items are close to each other. 4) Recursive routines took more time than non-recursive ones. The best routine I saw chain hashed the values into a 33333 item array, then it took the array and rebuilt a linked list out of it, and finally it ran an insertion sort on the list. For random data, there should be very few cases where more than 5% of the items in the rebuilt linked list will be out of place. Of these out of place items, they should be uniformly distributed within the linked list so that they don't have to be moved more than 3 spaces to place them back into their proper position. Since a 4000 item list will only fill the hash table to ~10%, there should be on average only 1.1 probes needed for each item. It should therefore take O(1.1n + n + c) where the second "n" comes from the insertion sort having to traverse the entire list at least once and the "c" is a constant depending upon how many items were out of place and how far they had to be moved on the average (shouldn't be more than 4000 * 5% * 3 = 600 or n * 5% * 3 = .15n). It should therefore average O(2.25n), have a best case of O(2n), and have a worst case of O(2n^2). The routine my team wrote insertion sorted every group of 12 integers, placed this group into an larger array, and then merge sorted this array. This gave a average time of O((n / 12) log (n / 12) + (12^2 / 4) * (n / 12)), a best case of O((n / 12) log (n / 12) + n), and a worst case of O((n / 12) log (n / 12) + (12^2 / 2) * (n / 12)). For 4000 items... First routine Second routine average time O(9000) O(14793.6) best time O(8000) O(6793.6) worst time O(3.2 * 10^7) O(26793.6) I should stress that the worst case for the second routine is pretty straight forward to find, but the worst case determination of the first routine would be quite a task. In practice, chances are the second routine will operate under unfavorable conditions many more times than the first routine. -- T. Tim Hsu UUCP ...pur-ee!ajz@mentor.cc.purdue.edu ARPA ajz@mentor.cc.purdue.edu FAX 1 317 494 0566 BITNET xajz@PURCCVM