Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!sri-unix!quintus!ok
From: ok@quintus.UUCP (Richard A. O'Keefe)
Newsgroups: comp.lang.c
Subject: Re: sorting (qsort)
Message-ID: <265@cresswell.quintus.UUCP>
Date: Mon, 23-Nov-87 20:19:48 EST
Article-I.D.: cresswel.265
Posted: Mon Nov 23 20:19:48 1987
Date-Received: Thu, 26-Nov-87 20:16:40 EST
References: <10491@brl-adm.ARPA>
Organization: Quintus Computer Systems, Mountain View, CA
Lines: 136
Summary: Party line

In article <10491@brl-adm.ARPA>, bard@THEORY.LCS.MIT.EDU (Bard Bloom)
replied to my question about the wide-spread (mis)use of "quick"sort.
He starts by saying that
> Quicksort -- if you code it just right -- has a very nice inner loop:
This is wildly, ludicrously, hilariously IRRELEVANT to generic sorting
algorithms.  What do I mean by a "generic" sorting algorithm?  I mean
one like the qsort() function in the ANSCI C draft which takes a user-
supplied function, or one like the sort(BU_CMD) command in the SVID
which takes a user-supplied sort key specification.  In fact, even if
you are sorting strings, the book-keeping in the inner loop is not of
much interest.  

There are about five different flavours of "oh" notation.  The one I
meant was the one where
	f(n) = o(g(n)) if lim[n->infinity] ((f(n)-g(n))/g(n)) = 0
that is, where f(n) is approximately equal to g(n) WHERE THE CONSTANT
FACTOR IS SIGNIFICANT.  I'm afraid this keyboard hasn't got thetas or
omegas, so I left it to common sense to work out what I meant.

Let me give you part of a table from Kurt Melhorn's "Data Structures
and Algorithms 1: Sorting and Searching", Springer-Verlag.
The table is on page 66.  I have omitted the column for A-sort, and
the row for run-time (because it is expressed in rather odd units,
and makes the inappropriate assumption that comparisons are cheap).

		  Single	
		  Maximum      HeapSort      QuickSort	  MergeSort
		  selection
# of comparisons+------------+------------+-------------+----------+
worst case	|  0.5 n**2  |  2 n lg n  | 0.5  n**2   |  n lg n  |
average case	|  0.5 n**2  |~ 2 n lg n  | 1.44 n lg n |  n lg n  |
		+------------+------------+-------------+----------+
storage		|  n         |  n         |  n + lg n   |  2n      |
requirement     |  + const   |  + const   |  + const    |  + const |
		+------------+------------+-------------+----------+
access  	|  random    |  random    |  random     | sq'ntial |
		+------------+------------+-------------+----------+
stable?         |  no        |  no        |  no         |  yes     |
		+------------+------------+-------------+----------+

The constant factors MATTER here.  The average case of heapsort,
quicksort, and merge sort is O(NlgN), BUT heapsort is twice as
costly as merge sort and quicksort is 1.44 times as costly.

The "storage requirement" line is misleading.  Suppose you are sorting
/etc/passwd on a PDP-11.  This is a typical application of a sorting
routine; I can count the number of times I have sorted a sequence of
numbers -- other than to test a sorting routine -- in the last eight
years on the fingers of no hands, but I sort files of text every day.
The /etc/passwd file here has 76 lines for a total of 4458 bytes.  The
overhead to enable merge-sort (on a PDP-11) is thus 76*sizeof (short)
bytes, which comes to 3%. Trading 3% more space for 30% to 40% less time
seems like a good bargain to me.  More generally, if you are sorting a
sequence of varying-length strings, you have to sort an array of
pointers if you want to use qsort(), and that's N locations of overhead,
whereas storing links with the strings so you can sort a list costs you,
oh my goodness, exactly N locations of overhead.  At least for
applications where you are sorting variable-sized objects, the alleged
space advantage of "quick" sort turns out to be bogus too.

Bloom quotes Sedgewick as advising that "if one is not willing to invest
the effort to be sure that a Quicksort implementation is not flawed,
Shellsort is a much safer choice."  True.  But also according to
Sedgewick {page 98} "The functional form of the running time of
Shellsort is not even known .... two conjectures are (lg N)**2 * N and
N**1.25."  I don't know what YOU think about a book which recommends a
complicated unanalysed method like Shellsort when there is a trivially
simple method (mergesort) which outperforms it -- not only in order of
magnitude but in constant factor -- but I know what *I* think.

Bloom says
> Most people don't worry about the constant factors, because they
> vary a lot with the model of computation.
When you are counting the number of times that some operation is done,
the model of computation has nothing to do with the case, tra-la.
This may be the heart of the matter:  if "most people don't worry about
the constant factors" that may explain why the qsort() routine in SunOS
3.2 was (31/19 - 1) = 63% slower at sorting an array of strings than a
naive implementation of merge sort in one particular test (using
strcmp() as the comparison function).  I expect the factor to be higher
on VAXen, where procedure calls are much more expensive, and less on
SPARCs, where procedure calls should be cheaper.  I have no reason to
believe that the qsort() routine in SunOS 3.2 is not identical to either
the 4.2BSD or the SYS V.2 one, and I doubt that Bloom has either.  If
the sorting routine in UNIX isn't "carefully tweaked", it ---- well
should be after all this time.  But perhaps nobody cared about constant
factors.  [I hasten to add that in general I have the greatest
admiration for the SUN operating system and for their machines, and
that I expect that they copied what they were told was the best code.]

The fact that the AVERAGE case of "quick" sort does about 1.4 times as
many comparisons as the WORST case of merge sort has been known ever
since quicksort was analysed.  The fact that the cost of calling a
user-supplied comparison routine completely swamps any reasonable
book-keeping has been obvious since the beginning.  The conclusion
which follows from this is that
	if the comparisons done by a sorting routine are
	single in-line machine instructions comparable in cost
	to an add or move,
	    then and only then quicksort is a good method
	    otherwise merge sort beats the pants off quicksort
However, this conclusion seems not to have been drawn.

Over the years I have measured the speeds of heapsort, quicksort,
and merge sort on DEC-10s (using Simula, Pop-2, and Prolog),
PDP-11s and VAX-11s (using C) and 680x0s (using C and Prolog).
There are wide variations between languages and machines, but
the ratio heapsort : quicksort : mergesort :: 2 : 1.4 : 1 holds
reasonably well within each machine/language combination.

I'm afraid that Bard Bloom has not provided evidence to refute this
conclusion, but has merely repeated the Party line.

The questions remain:
    (1) does anyone in net-land know of any references which purport
	to show that quick-sort does fewer comparisons than merge-sort?
    (2) does anyone in net-land know of any studies comparing actual
	implementations of quick-sort and merge-sort where the
	comparisons were not single in-line machine instructions but
	were procedure calls in which quick-sort outperformed merge-sort?
    (3) has anyone got an ancedote about an application which was
	originally coded using merge-sort but had to be recoded usin
	quick-sort (or any other in-place sort) to fit into a small
	machine?
    (4) does anyone know of a general-purpose sorting algorithm which
	consistently out-performs merge-sort?

If anyone wants to conduct their own studies, DON'T try your sorting
routine out on an array of numbers.  I suggest random permutations of
random subsets of /usr/dict/words as being more realistic.

I think this discussion does belong in *.lang.c, because presumably
people writing in C care about efficiency and portability, and if it
turns out (a) that qsort() is often implemented as some version of
quick-sort and (b) that quick-sort is as bad as I claim, then a lot
of us will be looking for something better to use.