Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!swrinde!gem.mps.ohio-state.edu!uakari.primate.wisc.edu!ames!uhccux!munnari.oz.au!mudla!ok From: ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) Newsgroups: comp.lang.c Subject: Re: compare strings, strcmp Keywords: strcmp,strings Message-ID: <2742@munnari.oz.au> Date: 18 Nov 89 07:08:30 GMT References: <4463@blake.acs.washington.edu> <11605@smoke.BRL.MIL> <214@isgtec.UUCP> Sender: news@cs.mu.oz.au Lines: 42 In article <214@isgtec.UUCP>, robert@isgtec.UUCP (Robert Osborne) writes: > In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: > >In article <4463@blake.acs.washington.edu> > > jimli@blake.acs.washington.edu (Jimmy Li) writes: > >>char array[10000][200]; > >>What is the fastest way to sort this array? > > > >Dump it into an external file and invoke the system sort utility on the file. > This is a REALLY bad way of sorting an array this big. ... He asked > for the FASTEST way, this shouldn't even have be given as an option. Oh ah, tried it, have you? In fact it is not a REALLY bad way of sorting. Depending on whether you have virtual memory, how much of it you have, what the page replacement algorithm is, how much real memory you have, &c &c sorting using an external file with a system sort utility may end up doing FEWER I/O operations than an internal sort. In fact in UNIX systems sort(1) uses an algorithm (merging) which is intrinsically faster than qsort(3)'s "quicker-sort" algorithm, but alas, the point at which it switches over to merging depends on memory size &c &c &c. A general point about questions like this: any answer we give is likely to be an answer to the wrong question. I find myself wondering why anyone wants to sort char array[10000][200]. Why exactly those number? Are the records really fixed length? How does anyone come by 200 characters without any substructure? If there is substructure, why not declare the array more explicitly and exploit the substructure in the sorting algorithm? For example, suppose that the records contain a field which is, for argument's sake, a date YYMMDD. By setting up a table of 366 buckets (one for each bucket) the records could be distributed among the buckets in O(N) time -- where N is the number of records -- and then the buckets could be sorted using a list merge on the rest of the fields. This would reduce the problem from O(N.lgN) cost to O(N.lg(N/365)), about a factor of 2.3 faster. Without knowing what Jimmy Li is really doing, we might give bad advice. Indeed, the very best way of ensuring that a table is sorted is to generate it in sorted order in the first place; without knowing what he is up to I can't tell whether the "null algorithm" can be used for his problem or not. There is another extremely important reason why Doug Gwyn's suggestion is a REALLY good one even if it proves slower in some cases. That is that the system sort utility provides you with a mini-language for specifying comparisons which can make it a lot easier to get your sort right.