Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!sun-barr!decwrl!ucbvax!tut.cis.ohio-state.edu!cica!ctrsol!ginosko!uunet!kddlab!icot32!nttlab!utogw!kuno
From: kuno@gssm.otsuka.tsukuba.JUNET (Yasushi Kuno)
Newsgroups: comp.unix.wizards
Subject: Re: Algorithm needed: reading/writing a large file
Message-ID: <416@utogw.gssm.otsuka.tsukuba.JUNET>
Date: 12 Jul 89 08:38:07 GMT
References: <205@larry.sal.wisc.edu> <8137@bsu-cs.bsu.edu> <207@larry.sal.wisc.edu>
Sender: news@utogw.gssm.otsuka.tsukuba.JUNET
Lines: 25

jwp@larry.sal.wisc.edu (Jeffrey W Percival) writes:

>My question was about optimizing the process of rearranging a disk
>file according to a *given* mapping.

>One helpful person suggested reading sequentially and writing randomly,
>rather than vice-versa, and I tried that but it didn't help.  I guess
>the benefit gained from using the input stream buffering was cancelled
>out by the effective loss of the output stream buffering.

>The ever-attentive and always-reliable Mr Torek suggested looking
>into disk cache algorithms in OS journals, and verily that I will do,
>with thanks to those who responded.

Oh, that seems quite right, but how about this?  Open N files for
writing, read sequentially, and write each record to one of them so
that N output files form ordered bucket... i.e. every record in output
file 1 is smaller than output file 2, and so on.  (You can do this
because you already have mapping in the memory.)  Then original
problem reduce to sorting N smaller random files.  Apply this
recursively until each file fit into your memory.  N and pass number
should depend on your system.  Isn't this simple?  I'm curious if it
works fine or not, so let me know if you have tried this.

							Yasushi Kuno