Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!mailrus!uflorida!novavax!proxftl!bill From: bill@proxftl.UUCP (T. William Wells) Newsgroups: comp.lang.c Subject: Re: Value of microeffiency Message-ID: <492@proxftl.UUCP> Date: 19 Jul 88 08:13:47 GMT References: <163@navtech.uucp> <2775@ttrdc.UUCP> <164@navtech.uucp> <8238@brl-smoke.ARPA> Organization: Proximity Technology, Ft. Lauderdale Lines: 78 In article <8238@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: ) In article <449@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: ) > Once upon a time I got very tired of the slowness of the ) > UNIX sort program. So I wrote a clone of it. I invented a ) > whole new algorithm for it which you could call a ) > linked-list merge sort. It beat every other sort method ) > that I tested hands-down. Its only drawback is the ) > memory for the linked list. (Actually, I'd bet that ) > someone invented this sort before me, but at the time, ) > I'd never heard of anything like it.) ) ) You don't actually need linked lists for this method; arrays of ) pointers to the records will do fine. That is known as "list merge" ) sorting (see Knuth vol. 3), and it is my favorite method. Actually, my sort method did not use a linked list per-se; instead, it used an array of pointers into that array, the index into the array was the index into the items being sorted. And, while this sort is very similar to the list-merge method, there are some differences. You'll see what I mean when I publish the code. ) I, too, reimplemented the sort utility, although my motivation was ) to sort binary records instead of text lines. As I iteratively ) improved it, it evolved into purely merge sorting, using external ) "run" files when necessary and sorting the internal buffer via ) list merge (originally this was done via heapsort). Someday I ) hope to polish this up and contribute it to the sources newsgroup. This does seem to be the canonical method of doing general sorts. Having heard of sorts which do not behave as if they were this kind of sort, I do wonder whether any netters have any comments on better ways of sorting large amounts of data? ) >Most people who defend the notion that there are good and bad ) >algorithms try to treat algorithms as if they exist in a vacuum. ) >This is Idealist hokum. Algorithms exist to serve purposes, the ) >quality of a given algorithm is simply how well it serves a ) >given purpose and thus can not be spoken of independently of ) >said purpose. ) ) Well said, but this should be tempered with the realization that ) code often survives its original intended purpose and gets pressed ) into service for unanticipated uses. Therefore it often pays off ) in the long run to select an algorithm that works well enough to ) not cause problems when this happens. Some sorting methods, e.g. ) bubble sort, are a real catastrophe when scaled up, so usually it ) is wise to pick a method that scales better. In general, this is true. However, it does not change the point of what I said, it just points out that one had better consider as part of the purpose while selecting an algorithm the future uses of the algorithm. Else, that algorithm will not be suited to its future use. As an aside, when I chose the bubble sort, the items being sorted were entries in an input transaction; since this was strictly limited by how much the user could type there was no chance that the bubble sort would become inappopriate. ) One of my pet peeves is code that assumes that everything will ) always fit into main memory, when it would have been only slightly ) more trouble to code a method that doesn't rely on that assumption. ) Of course, sometimes there is a huge performance difference, but ) not always -- especially when you consider that relying on virtual ) memory amounts to surrendering application control over external ) buffering to the operating system, which doesn't know what the ) best buffering strategy for the particular application is. This is right in line with one of my pet peeves: the presumption that, because the power of computers is growing by leaps and bounds, one can be as wasteful as one cares, and that somehow, the computers will get powerful enough to run your program in spite of the waste. Argh!!!! By the way, this isn't really C; where ought this kind of discussion go? I'd say comp.misc, but I am already embroiled in an idiot discussion there and do not want to move anything more over there unless I must.