Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!security!genrad!mit-eddie!mit-vax!eagle!harpo!floyd!clyde!ihnp4!inuxc!pur-ee!uiucdcs!uicsl!preece From: preece@uicsl.UUCP Newsgroups: net.unix-wizards Subject: Re: MAX FILES PER PROCESS PROBLEM - (nf) Message-ID: <4374@uiucdcs.UUCP> Date: Thu, 8-Dec-83 04:34:13 EST Article-I.D.: uiucdcs.4374 Posted: Thu Dec 8 04:34:13 1983 Date-Received: Sat, 10-Dec-83 22:00:56 EST Lines: 34 #R:ccieng5:-19600:uicsl:12500018:000:1720 uicsl!preece Dec 5 08:48:00 1983 [The following response to my distribution sort example was received by mail] But your example helps make the point that many-file algorithms almost certainly need to be replaced with better approaches. As soon as your client for the "distribution sort of alphabetic records" notices that the records filed under each letter are not sorted, he is likely to ask for that obvious improvement. Then what do you do, insist on using 26^N (where N ~ 10) open files? That is a disgusting sorting scheme! ---------- The respondent has apparently not read volume 3 of Knuth. After applying the distribution the individual sets can be sorted by whatever means and concatenated (not merged), since they are ordered (that is each item in set s is greater than each element in set s-1, less than each item in set s+1). The idea of gaining sort speed by subdividing is well known; Quicksort is an example of the method. Distribution sorting is convenient when the key lends itself to distribution. It is also possible to sort by distribution entirely by distributing on the least significant positions first, concatenating, and moving on the next most significant, etc. This may be familiar to you if you've ever sorted on a card sorting machine (most of the net is probably too young to have had that particular experience). With variable-length keys it may be convenient to distribute first by length, so that short keys are skipped in until we get to the key positions they include (my particular application is sorting text words for file inversion). At any rate, distribution sorting is well known and effective, the respondent is apparently ignorant as well as abrasive. scott preece ihnp4!uiucdcs!uicsl!preece