Path: utzoo!utgpu!watserv1!watmath!att!att!tut.cis.ohio-state.edu!cs.utexas.edu!yale!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.misc Subject: Re: Sorting bounds Message-ID: <29686:Nov2505:33:4290@kramden.acf.nyu.edu> Date: 25 Nov 90 05:33:42 GMT References: <1990Nov20.202143.10746@craycos.com> <6327:Nov2022:30:2490@kramden.acf.nyu.edu> <1990Nov21.211000.11359@craycos.com> Organization: IR Lines: 114 Rehash of previous exposition in more detail. Again, skip if you know why radix sort does not obey the n log n bound. In article <1990Nov21.211000.11359@craycos.com> jrbd@craycos.com (James Davies) writes: > In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: > >> Sorry to inject some mathematics into this lovely flamefest, but... > >> The bound on sorting is O(N log N) comparisons because you must be able to > >> generate all of the N! possible permutations of the N inputs. Under the banner of ``mathematics'' that I wish you would stop flying, you claim here that all (general) sorting methods take time at least cN log N to sort N numbers, where c is some positive number depending on the method. If this is an inaccurate translation of your quoted statement, please let me know. That claim is entirely false. You have not justified it. It is contradicted by all variants of MSD radix sort, which are linear in the number of bits in the inputs; the number of bits can be asymptotically less than cN log N for all c. Please stop insulting mathematics. > >That bound is only valid if the only permitted operation is comparison. > If you define "pick one of N buckets" as one operation, sorting is indeed > linear. I'll maintain that your operation takes O(log N) time to perform. I'm sorry I didn't dismiss this strongly enough in my first response. It is absolutely irrelevant how long it takes to pick one of N buckets. All radix sort uses is picking one of a constant number (you said K at first) of buckets. For definiteness, let's settle upon K as being 256 for all time. It takes constant time, in theory and in practice, to select one of 256 buckets. > >> N! grows approximately as N**N (Stirling's approximation, I believe it's > >> called). > >No, n does not grow as n^n. Stirling's (first) approximation is that n! > >is about n^n/e^n times the square root of 2 pi n. > Irrelevant, invalid, etc. (substitute your favorite dismissal word here.) > log(N!) still grows as N log N. You said you were introducing some mathematics into this discussion. I was correcting your errors, one by one. Here you began a supposed proof that log(n!) grows as n log n; but your first statement, namely that n! grows as n^n, was false. The conclusion is true but you still haven't proven it. (Given that n! is larger than n^n/e^n times the square root of 2 pi n [see Abramowitz & Stegun], log n! is larger than n times the log of n/e. When n is greater than 3, n/e is greater than the square root of n, so log n/e is greater than half log n. Hence log n! is at least half n log n for large n.) > >> Radix or bucket sorts may appear to be linear, but in fact they are also > >> N log N in comparisons. (Picking one of K buckets requires a K-way > >> comparison.) > >Picking one of K buckets does not require a K-way comparison. Even if it > >did, that would not be relevant. Radix sort is linear in the number of > >bytes, which can be o(n log n) if there are many duplicate keys. > I was talking about sorting N items, not N bytes. Yes, and I was following your terminology. As I said before, the number of bytes can be o(n log n). If you attempt to prove your statement that radix sort uses at least cn log n comparisons, you will find an uncrossable gap at this point. > Picking one of K buckets is indeed equivalent to a K-way comparison. > That it appears to be "one" operation is merely an illusion, created > mainly by fixed address lengths of real machines and parallel addressing > hardware. This is probably your big mental block, Dan - you don't realize > what is really going on here. Fine, believe what you want. Even if it took K operations to pick one of K buckets, that would be absolutely irrelevant: K is a *constant*. Make it 2 if you want. > It would be equally valid to assert that sorting takes one "sort" unit, > so that sorting can be done in constant time. I was sticking to your terminology. You claim that radix sort is theta(n log n) where n is the number of keys. (Here theta is the inverse of big O.) That claim is simply false. > And we're talking real asymptotic here, not "real world" > asymptotic or whatever your concept is. I am talking about real asymptotics on a large class of (realistic) computational models. Once you see Knuth make a simply false statement about ``real-world'' asymptotics, you learn to pay attention to those log factors. In this problem there's no log factor. > BTW, what does duplicate keys have to do with it? If all the keys are > duplicate, then sorting is truly constant time. What are you talking about? By definition, a general sorting method cannot know such information about the keys beforehand. Furthermore, nobody is talking about all keys being the same. > >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of > >them is almost exactly 100---internal sorting by far dominates any > >merging. > Just as an exercise, why don't you try calculating the merge time? As a matter of fact, I did calculate it, and I presented the results in my followup to his article. Just as an exercise, why don't you go back and read my followup before you assign any further exercises? > (but then again, I wouldn't expect you to do something so rigorous - > it's easier to wave your hands and say "merging is linear!"). Why the ad hominem attacks? I haven't seen you show any respect for mathematical rigor. Merging is not linear, but it is, even for such ridiculously large problems as Ritchie's trainload of tapes, dominated by sorting. (David Chase has pointed out in e-mail that a distributed solution might tip the scale the other way, but I don't believe any distributed solution will be cost-effective.) ---Dan