Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!mcgill-vision!snorkelwacker!apple!usc!samsung!munnari.oz.au!goanna!ok
From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: Prolog's Pitfalls Revisited
Message-ID: <3367@goanna.cs.rmit.oz.au>
Date: 3 Jul 90 07:09:39 GMT
References: <1990Jun26.170835.11037@chaos.cs.brandeis.edu> <3309@usceast.UUCP>
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 53

In article <3309@usceast.UUCP>, mgv@usceast.UUCP (Marco Valtorta) writes:
> It seems that Harel does know that quicksort has been known to be inferior
> for nearly 30 years :-).  More seriously, I do not think that either
> Knuth or Sedgewick support the view that quicksort is "inferior."
> All experimental data (on collections of "random" arrays) I know 
> of (admittedly not recent data) points
> to quicksort being faster that mergesort on average.

The *worst* case behaviour of quicksort is not what I was referring to.
What I was referring to is the well-known fact that the average case of
quicksort does about 30% more comparisons than the *worst* case of
merge sort.  This information *IS* in Knuth Volume 3 and in Sedgewick's
"Algorithms" (at least the 1st edition, I haven't seen the second).

My reference to the known inferiority of quicksort was of course to
Hoare's 1961 paper in Computer Journal, where quicksort was first published.
He gave the formula there for the number of comparisons done by quicksort,
and the number of comparisons done by mergesort was known when the paper
was submitted in 1960 to be LESS.

Unfortunately, there is a tradition of performing sorting experiments
on arrays of integers.  This is certainly convenient for coding, because
then you can use the language's built-in "<" operator.  It is, however,
wildly unrealistic if you are NOT sorting things known at compile time
to be integers.  Now, if you KNOW at compile time that you are sorting
integers, then there are sorting algorithms with O(N) average cost (e.g.
radix sort).  If you don't know that you have integers, e.g. you are
sorting strings, then the cost of comparisons dominates, and quick sort
loses *badly*.

Sedgewick's paper in Acta Informatica contains detailed cost formulas.
It turns out that to get the favourable result for quicksort that everyone
has taken as Gospel, he assumes that a comparison costs 1/4 as much as
a memory reference.  In his dissertation, he was looking at code on a
/370 or that kind of machine, and charging a CR instruction as 1/4 of an
L instruction is not unreasonable.  But that ONLY applies when you know at
compile time that you are sorting integers.  It does not apply when you
are sorting strings or atoms or doing a procedure call for your comparison.

> I would like to know of references to data that supports the opposite view.

Look at any paper that compares quicksort and merge sort for anything
but integers.

It's worth pointing out that in a language which lacks arrays (such as
most Prologs and many functional languages) the space advantage of quick
sort does not exist in any form at all, and if you are sorting variable-
length records, there is no space advantage for quicksort in conventional
languages either.

-- 
Science is all about asking the right questions.  | ok@goanna.cs.rmit.oz.au
I'm afraid you just asked one of the wrong ones.  | (quote from Playfair)