Path: utzoo!utgpu!watserv1!watmath!att!att!tut.cis.ohio-state.edu!cs.utexas.edu!yale!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: comp.lang.misc
Subject: Re: Sorting bounds
Message-ID: <29686:Nov2505:33:4290@kramden.acf.nyu.edu>
Date: 25 Nov 90 05:33:42 GMT
References: <1990Nov20.202143.10746@craycos.com> <6327:Nov2022:30:2490@kramden.acf.nyu.edu> <1990Nov21.211000.11359@craycos.com>
Organization: IR
Lines: 114

Rehash of previous exposition in more detail. Again, skip if you know
why radix sort does not obey the n log n bound.

In article <1990Nov21.211000.11359@craycos.com> jrbd@craycos.com (James Davies) writes:
> In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
> >> Sorry to inject some mathematics into this lovely flamefest, but...
> >> The bound on sorting is O(N log N) comparisons because you must be able to
> >> generate all of the N! possible permutations of the N inputs.

Under the banner of ``mathematics'' that I wish you would stop flying,
you claim here that all (general) sorting methods take time at least cN
log N to sort N numbers, where c is some positive number depending on
the method. If this is an inaccurate translation of your quoted
statement, please let me know.

That claim is entirely false. You have not justified it. It is
contradicted by all variants of MSD radix sort, which are linear in the
number of bits in the inputs; the number of bits can be asymptotically
less than cN log N for all c. Please stop insulting mathematics.

> >That bound is only valid if the only permitted operation is comparison.
> If you define "pick one of N buckets" as one operation, sorting is indeed
> linear.  I'll maintain that your operation takes O(log N) time to perform.

I'm sorry I didn't dismiss this strongly enough in my first response. It
is absolutely irrelevant how long it takes to pick one of N buckets. All
radix sort uses is picking one of a constant number (you said K at
first) of buckets. For definiteness, let's settle upon K as being 256
for all time. It takes constant time, in theory and in practice, to
select one of 256 buckets.

> >> N! grows approximately as N**N (Stirling's approximation, I believe it's
> >> called).
> >No, n does not grow as n^n. Stirling's (first) approximation is that n!
> >is about n^n/e^n times the square root of 2 pi n.
> Irrelevant, invalid, etc. (substitute your favorite dismissal word here.)
> log(N!) still grows as N log N.

You said you were introducing some mathematics into this discussion. I
was correcting your errors, one by one. Here you began a supposed proof
that log(n!) grows as n log n; but your first statement, namely that n!
grows as n^n, was false. The conclusion is true but you still haven't
proven it. (Given that n! is larger than n^n/e^n times the square root
of 2 pi n [see Abramowitz & Stegun], log n! is larger than n times the
log of n/e. When n is greater than 3, n/e is greater than the square
root of n, so log n/e is greater than half log n. Hence log n! is at
least half n log n for large n.)

> >> Radix or bucket sorts may appear to be linear, but in fact they are also
> >> N log N in comparisons.   (Picking one of K buckets requires a K-way 
> >> comparison.)
> >Picking one of K buckets does not require a K-way comparison. Even if it
> >did, that would not be relevant. Radix sort is linear in the number of
> >bytes, which can be o(n log n) if there are many duplicate keys.
> I was talking about sorting N items, not N bytes.

Yes, and I was following your terminology. As I said before, the number
of bytes can be o(n log n). If you attempt to prove your statement that
radix sort uses at least cn log n comparisons, you will find an
uncrossable gap at this point.

> Picking one of K buckets is indeed equivalent to a K-way comparison.
> That it appears to be "one" operation is merely an illusion, created
> mainly by fixed address lengths of real machines and parallel addressing
> hardware.  This is probably your big mental block, Dan - you don't realize
> what is really going on here.

Fine, believe what you want. Even if it took K operations to pick one of
K buckets, that would be absolutely irrelevant: K is a *constant*. Make
it 2 if you want.

> It would be equally valid to assert that sorting takes one "sort" unit,
> so that sorting can be done in constant time.

I was sticking to your terminology. You claim that radix sort is
theta(n log n) where n is the number of keys. (Here theta is the inverse
of big O.) That claim is simply false.

> And we're talking real asymptotic here, not "real world" 
> asymptotic or whatever your concept is.

I am talking about real asymptotics on a large class of (realistic)
computational models. Once you see Knuth make a simply false statement
about ``real-world'' asymptotics, you learn to pay attention to those
log factors. In this problem there's no log factor.

> BTW, what does duplicate keys have to do with it?  If all the keys are
> duplicate, then sorting is truly constant time.

What are you talking about? By definition, a general sorting method
cannot know such information about the keys beforehand. Furthermore,
nobody is talking about all keys being the same.

> >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of
> >them is almost exactly 100---internal sorting by far dominates any
> >merging. 
> Just as an exercise, why don't you try calculating the merge time?

As a matter of fact, I did calculate it, and I presented the results in
my followup to his article. Just as an exercise, why don't you go back
and read my followup before you assign any further exercises?

> (but then again, I wouldn't expect you to do something so rigorous - 
> it's easier to wave your hands and say "merging is linear!").

Why the ad hominem attacks? I haven't seen you show any respect for
mathematical rigor. Merging is not linear, but it is, even for such
ridiculously large problems as Ritchie's trainload of tapes, dominated
by sorting. (David Chase has pointed out in e-mail that a distributed
solution might tip the scale the other way, but I don't believe any
distributed solution will be cost-effective.)

---Dan