Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!rex!wuarchive!uunet!isc-br!hawk!wddami!wayned From: wayned@wddami.spoami.com (Wayne Diener) Newsgroups: comp.lang.c Subject: Re: Sorting Algorithm Message-ID: Date: 23 Aug 90 15:28:11 GMT References: <3612@goanna.cs.rmit.oz.au> Sender: uueagle@hawk.isc-br.com (Eagle Proj UUCP login) Lines: 51 >In article <3612@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >In article , >wayned@wddami.spoami.com (Wayne Diener) writes: [ text deleted ] >The C code presented is inordinately complicated, which makes it hard to >see what is going on. Basically, methods of inserting something into a >sequence using binary search fall foul of one of two problems: > - if you use an array, you can *get* anywhere fast, > but it takes O(N) time to move other stuff out of the > way to *insert* anything > - if you use a linked list, you can *insert* fast, > but it takes O(N) to *get* anywhere by following links. >(There is a compromise position: use a binary tree. Binary trees make >an excellent representation of sequences, and handle insertion well. >That would yield a variant of TreeSort.) Sorry if the code seems complicated. As I mentioned, I was a _rank_ amateur at the time. (Hopefully, I've moved up to _full_ amateur status by now.) I find the code easy to understand, but perhaps that's because I wrote it. > >Wayne Diener's version of the algorithm runs into the second problem. >To be sure, it is doing O(NlgN) *comparisons*, but in order to get anywhere >it has to follow pointers O(N**2) times. The bottle-neck is the code > > for (i = 0; i < middle; i++) { > current = up ? current->prev_word : current->next_word; > if (current == sortbound) { test = 1; break; } > } > >So it is an O(N**2) member of the family of insertion sorts. A couple of points here: 1) The _worst_ case number of pointer movements for each search is: N/2 + N/4 + N/8 + .... = N, and it has to do it N times, yielding O(N**2). However, I believe you'll find that the average case will yield O(K * log base 2 (N)) (where K is probably about 1.4) number of pointer movements per element. 2) Although it yields only a constant improvement (and doesn't change the Big O class), the time cost of a pointer movement is (as a normal rule) _substantially_ less than the time cost of the comparison. If the comparison is quite costly, then for reasonable numbers of N (say 10K to 50K max?) the efficiency of the sort would still be approximately O(N * Log(N)) _even IF the pointer movement is O(N**2)_. --