Path: utzoo!utgpu!news-server.csri.toronto.edu!hub.toronto.edu!thomson Newsgroups: comp.lang.misc From: thomson@hub.toronto.edu (Brian Thomson) Subject: Re: A very brief history of optimal sorting methods Message-ID: <1990Nov29.181008.11892@jarvis.csri.toronto.edu> Organization: University of Toronto References: <5488@lanl.gov> <6913:Nov1008:23:5690@kramden.acf.nyu.edu> <16709:Nov1113:56:2390@kramden.acf.nyu.edu> <237@smds.UUCP> <855@mgt3.sci.UUCP> Date: 29 Nov 90 23:10:08 GMT Lines: 52 In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: >> Sorry to inject some mathematics into this lovely flamefest, but... >> The bound on sorting is O(N log N) comparisons because you must be able to >> generate all of the N! possible permutations of the N inputs. > >That bound is only valid if the only permitted operation is comparison. Since one Turing Award winner has already contributed to this thread, I will quote another: "Note that radix sort [of n numbers], which is sometimes said to be linear time, really requires at least n log n steps, if one assumes the input numbers have enough digits so that they all can be distinct." S. A. Cook, "An Overview of Computational Complexity" 1982 Turing Award Lecture This is, of course, a restatement of the "each key must be at least log n digits long" argument propounded by Mark Carroll. The only part about this that worries me is that, if it is "fair ball" to consider the length of the keys for radix sort, then why do we not consider each comparison in the comparison-based sorts to be log n steps, yielding O(n log^2 n) cost? Well, the answer is that we do have to, according to another expert: "Actually, as N approaches infinity, a better indication of the time needed to sort is N (log N)**2, if the keys are distinct, since the size of the keys must grow at least as fast as log N; but for practical purposes, N never really approaches infinity." D. Knuth, The Art of Computer Programming Vol. 3, p. 5 He makes it clear later that he is talking about comparison-based sorts, and justifies it by requiring that the sort be based on an abstract ordering, for which a radix-based method will not work. It appears clear to me that "linear in the number of bytes" really is the most accurate characterization of the time complexity of radix sorting, and that it is not always the same as saying "n log n in the number of keys". But, if the objective is to compare radix-based and comparison-based sorting methods, which I believe was what sparked all this in the first place, we are faced with an apples-and-pomegranates problem. Complexity measures depend on 1) how you measure the size of the problem, and 2) how you measure the size of the computation (i.e., which computational model do you use). The conclusion I draw from all the smoke and fire we have seen here recently is that no two netters can agree on what set of assumptions to make to put comparison and radix sorts on the same playing field or, to unmix a metaphor, "in the same orchard". -- Brian Thomson, CSRI Univ. of Toronto utcsri!uthub!thomson, thomson@hub.toronto.edu