Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!yale!cmcl2!lanl!jlg From: jlg@lanl.gov (Jim Giles) Newsgroups: comp.lang.misc Subject: Re: A very brief history of optimal sorting methods Message-ID: <5948@lanl.gov> Date: 14 Nov 90 20:02:08 GMT References: <4440:Nov1405:06:2590@kramden.acf.nyu.edu> Organization: Los Alamos Natl Lab, Los Alamos, N.M. Lines: 24 From article <4440:Nov1405:06:2590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > Sorting is linear in the number of bytes being sorted. The length of the > keys is absolutely, entirely irrelevant to this conclusion. Radix sorting is linear in the number of bytes perhaps. It is also linear in the number of keys to be sorted _times_ the length of each key. The speed of sorting techniques is _uniformly_ given by _all_ the literature as a function of the number of _keys_ to be sorted, _not_ as a function of the number of bytes sorted. When the number of _bytes_ sorted is greater than N log N (where N is the number of _keys_), it is often better to use some other sort. If a whole key (like a 64-bit integer for example) can be compared in the same time as it takes to compare your little 8-bit chunk of it, and if the whole key can be loaded, moved, stored in _less_ time than it takes to _unpack_ just _one_ of your little 8-bit chunks, and if the keyspace is large (say, 64-bit integers) but sparcely populated (like you're only sorting 100 million of them), then radix sorting is usually at a severe deficit compared to other methods (quicksort, for example). Since all of the conditions above are met by _most_ sorting problems, radix sorting is at a deficit _most_ of the time. J. Giles