Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!rutgers!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.misc Subject: Re: Sorting Message-ID: <29413:Nov1921:32:5190@kramden.acf.nyu.edu> Date: 19 Nov 90 21:32:51 GMT References: <11628@alice.att.com> Organization: IR Lines: 43 In article <11628@alice.att.com> dmr@alice.att.com (Dennis Ritchie) writes: [ real words from a Real Programmer ] > But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, > each containing 10,000,000 100-byte binary records > without any substructure. I need to get these records sorted. > Can someone quote me a cost and expected delivery time? I'm afraid I'm not in a position to contract out NYU's computer time to AT&T, but I'll try to answer your question. Let me assume a mainframe with 100M memory usable for data, internal speed about ten times a Sun 4, and a 10-terabyte 5000-disk farm. To sort as much data as will fit into memory will take 20 seconds with my current software, for a total CPU time of 23 days. The final merge shuttles a total of 170 terabytes to and from disk, using approximately 5 days of CPU time and relatively little I/O time. If we can read (by some miracle) 100K a second on a hundred separate tape drives, reading all your data takes 11 days. Add another 11 days for writing back to tape (though we suggest a better storage method). Total: Only as much time as Noah spent in his ark. We'll give you a guaranteed time of twelve weeks, expected nine. The disk farm is the biggest cost: including controllers and an IBM to handle it all, the disks could cost up to $10M initially. Assuming we do this all the time and already have disks, you just pay something more than maintenance costs plus floor space plus personnel plus CPU time for two months. So (without any experience with how much a company might actually charge for a project of such a size) I'd estimate nine weeks and $1,000,000. > This is just the start of a burgeoning business; next week > I expect a full trainload of 100 such boxcars. > What will this cost and how long will it take your company > to handle it? Approximately 100 times the cost and approximately 100 times the time. However, a 500000-disk farm is much more than 100 times less realistic than a 5000-disk farm; and I don't think anyone's going to embark on a project taking a couple of decades to complete. ---Dan