Path: utzoo!attcan!uunet!samsung!crackers!m2c!jjmhome!smds!rh From: rh@smds.UUCP (Richard Harter) Newsgroups: comp.lang.misc Subject: Re: A very brief history of optimal sorting methods Summary: Rubbish Message-ID: <235@smds.UUCP> Date: 11 Nov 90 08:33:44 GMT References: <24149:Nov905:42:0990@kramden.acf.nyu.edu> <5488@lanl.gov> <6913:Nov1008:23:5690@kramden.acf.nyu.edu> Organization: SMDS Inc., Concord, MA Lines: 42 In article <6913:Nov1008:23:5690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > This is funny. Computer science really does progress backwards. > In the '60s, people knew that sorting was linear in the number of bytes > being sorted. The post office had already been using MSD radix sort for > a long time, though they probably never bothered to analyze its running > time. Knuth says that LSD became more popular for computers because in > MSD ``the multiplicity of piles becomes confusing''; others pointed > wisely to the initials LSD and smiled. [Much more material in this vein deleted.] Dan, me boy, you're having a wee bit of trouble telling your burro from yer burrow. If you have N records with distinct keys the effective length of the keys is log(N). MS* sorts and binary sorts (qsort, mergesort, whatever) must do O(N(log(N)) record (or pointer) moves. In either case the result is N*log(N) records moved. In principle an MS* sort need only examine N*log(N) bits whereas a binary sort must examine N*log(N)**2 bits; in practice this advantage is pretty nominal. The true advantage of MS* (but not LS*) sorts is that one can use more than two bins, changing the base of the log term. For example, a byte based MSB sort uses 256 bins which reduces the number of moves by a factor of 8. THERE ARE NO TRULY LINEAR SORTS. There are, however, quasi-linear sorts, i.e. sorts which are linear for a restricted range of N and some restrictions on the data. These sorts depend, ultimately, on the fact that arithmetic is (up to the word size) a parallel operation. The fastest possible sort requires an address calculation function which gives, for each key, the location in the sorted file that the record must be moved to. Given this one can sort in linear time -- provided that a record can be moved from one location to any other in constant time. The latter requirement cannot be satisfied for sufficiently large N (i.e. the file is larger than available memory). -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.