Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uakari.primate.wisc.edu!pikes!aspen.craycos.com!jrbd From: jrbd@craycos.com (James Davies) Newsgroups: comp.lang.misc Subject: Re: Sorting bounds Message-ID: <1990Nov21.211000.11359@craycos.com> Date: 21 Nov 90 21:10:00 GMT References: <1990Nov20.202143.10746@craycos.com> <6327:Nov2022:30:2490@kramden.acf.nyu.edu> Organization: Cray Computer Corporation Lines: 63 In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Skip if you know why radix sort does not observe the n log n bound. (I'm assuming Dan is the only one who skipped this...) >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: >> Sorry to inject some mathematics into this lovely flamefest, but... >> The bound on sorting is O(N log N) comparisons because you must be able to >> generate all of the N! possible permutations of the N inputs. > >That bound is only valid if the only permitted operation is comparison. If you define "pick one of N buckets" as one operation, sorting is indeed linear. I'll maintain that your operation takes O(log N) time to perform. > >> N! grows approximately as N**N (Stirling's approximation, I believe it's >> called). > >No, n does not grow as n^n. Stirling's (first) approximation is that n! >is about n^n/e^n times the square root of 2 pi n. Irrelevant, invalid, etc. (substitute your favorite dismissal word here.) log(N!) still grows as N log N. > >> Radix or bucket sorts may appear to be linear, but in fact they are also >> N log N in comparisons. (Picking one of K buckets requires a K-way >> comparison.) > >Picking one of K buckets does not require a K-way comparison. Even if it >did, that would not be relevant. Radix sort is linear in the number of >bytes, which can be o(n log n) if there are many duplicate keys. I was talking about sorting N items, not N bytes. Picking one of K buckets is indeed equivalent to a K-way comparison. That it appears to be "one" operation is merely an illusion, created mainly by fixed address lengths of real machines and parallel addressing hardware. This is probably your big mental block, Dan - you don't realize what is really going on here. It would be equally valid to assert that sorting takes one "sort" unit, so that sorting can be done in constant time. Memory access times don't remain constant for an arbitrary number of items (any more than sorting times do). And we're talking real asymptotic here, not "real world" asymptotic or whatever your concept is. BTW, what does duplicate keys have to do with it? If all the keys are duplicate, then sorting is truly constant time. This is too realistic a restriction when talking about a theoretical problem. > >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of >them is almost exactly 100---internal sorting by far dominates any >merging. Just as an exercise, why don't you try calculating the merge time? Assume constant access time to the data on the tapes (i.e. no delays for operator intervention), and also assume that your internal memory won't hold more than a few tapes at a time. You might be surprised (but then again, I wouldn't expect you to do something so rigorous - it's easier to wave your hands and say "merging is linear!").