Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site allegra.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!jpl From: jpl@allegra.UUCP (John P. Linderman) Newsgroups: net.unix-wizards Subject: Re: Speed of seeks Message-ID: <6122@allegra.UUCP> Date: Mon, 19-May-86 10:58:34 EDT Article-I.D.: allegra.6122 Posted: Mon May 19 10:58:34 1986 Date-Received: Wed, 21-May-86 01:01:01 EDT References: <12593@ucla-cs.ARPA> <645@baylor.UUCP> <1535@umcp-cs.UUCP> Reply-To: jpl@allegra.UUCP (John P. Linderman) Distribution: net Organization: AT&T Bell Laboratories, Murray Hill Lines: 43 Summary: With a little help from Chris Torek, Peter da Silva noted >>According to the library, the argument to [fseek] needs to be the >>returned value from [an ftell()] call: it may indeed be a magic cookie >>like a sector/offset pair (and in fact "magic cookie" is the way it's >>described in the manual). It is under RSX/11M and on the ATARI 800. and Chris goes on to add >Actually, those who write libraries such that fseek takes a byte >index either have missed that point, or are very clever: Because >PDP-11 and Vax Unix systems have always had fseek and ftell return >byte indicies, there is no doubt quite a bit of code that depends >on that behaviour. These programs are technically broken, but it >is hard to keep a customer happy by informing it [note non-sexist >word :-)] of this fact. This points up the typical bind faced by those who wish to write programs that are portable but not pathetic. I wanted to write a sort program that checkpointed itself... after sorting and writing a coreload of data, the data is (kind of) safe, and, if the system crashes, it isn't necessary to sort it again, if we can just seek past it in the input file. In general, we don't know how many complete records will fit in a coreload until we are part way through a record that is too big to fit. If seek addresses are byte offsets, no big deal. Just keep track of how many bytes have been read after each complete record. But if seek addresses are magic cookies, bad news. We can use ftell to find the find the address of the middle of the current, incomplete, record, but that does us no good at all, since it is the address of the START of the record that we need, and you cannot do cookie arithmetic. Or, we can do an ftell after every complete record, just in case the next record won't fit. Ughleee! That ftell boils down to a system call, and after stdio goes to all the bother of minimizing system calls, we come along and add one after every record. I have tried this, and it results in performance that is anywhere from 10 to 100 times as bad! Completely unacceptable. In the particular case, there is a way out. Since restarting from a checkpoint is rare, and since input may be coming from a pipe anyway, it is both tolerable and preferable to get to byte offset N by doing N getc's (I like to call this a getceek.) But the solution is peculiar to this problem, and I would have chosen to ``technically break'' the program if I had been forced to choose between complete portability and acceptable performance. John P. Linderman Seek and Ye Shall Grind allegra!jpl