Path: utzoo!utgpu!news-server.csri.toronto.edu!hub.toronto.edu!thomson
Newsgroups: comp.lang.misc
From: thomson@hub.toronto.edu (Brian Thomson)
Subject: Re: A very brief history of optimal sorting methods
Message-ID: <1990Nov29.181008.11892@jarvis.csri.toronto.edu>
Organization: University of Toronto
References: <5488@lanl.gov> <6913:Nov1008:23:5690@kramden.acf.nyu.edu> <16709:Nov1113:56:2390@kramden.acf.nyu.edu> <237@smds.UUCP> <RANG.90Nov18150047@nexus.cs.wisc.edu> <855@mgt3.sci.UUCP>
Date: 29 Nov 90 23:10:08 GMT
Lines: 52

In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
>> Sorry to inject some mathematics into this lovely flamefest, but...
>> The bound on sorting is O(N log N) comparisons because you must be able to
>> generate all of the N! possible permutations of the N inputs.
>
>That bound is only valid if the only permitted operation is comparison.

Since one Turing Award winner has already contributed to this thread,
I will quote another:

   "Note that radix sort [of n numbers], which is sometimes said to be
    linear time, really requires at least n log n steps, if one assumes
    the input numbers have enough digits so that they all can be distinct."

		    S. A. Cook, "An Overview of Computational Complexity"
                        1982 Turing Award Lecture

This is, of course, a restatement of the "each key must be at least log n
digits long" argument propounded by Mark Carroll.  The only part about this
that worries me is that, if it is "fair ball" to consider the length of the
keys for radix sort, then why do we not consider each comparison in the
comparison-based sorts to be log n steps, yielding O(n log^2 n) cost?
Well, the answer is that we do have to, according to another expert:

  "Actually, as N approaches infinity, a better indication of the time
   needed to sort is N (log N)**2, if the keys are distinct, since the
   size of the keys must grow at least as fast as log N; but for practical
   purposes, N never really approaches infinity."

		    D. Knuth, The Art of Computer Programming Vol. 3, p. 5

He makes it clear later that he is talking about comparison-based sorts,
and justifies it by requiring that the sort be based on an abstract
ordering, for which a radix-based method will not work.

It appears clear to me that "linear in the number of bytes" really is the
most accurate characterization of the time complexity of radix sorting,
and that it is not always the same as saying "n log n in the number of keys".
But, if the objective is to compare radix-based and comparison-based
sorting methods, which I believe was what sparked all this in the first
place, we are faced with an apples-and-pomegranates problem.

Complexity measures depend on 1) how you measure the size of the problem,
and 2) how you measure the size of the computation (i.e., which computational
model do you use).   The conclusion I draw from all the smoke and fire
we have seen here recently is that no two netters can agree on what
set of assumptions to make to put comparison and radix sorts on the
same playing field or, to unmix a metaphor, "in the same orchard".
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    utcsri!uthub!thomson, thomson@hub.toronto.edu