Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!philabs!cmcl2!seismo!umcp-cs!cvl!umd5!zben
From: zben@umd5.UUCP (Ben Cranston)
Newsgroups: net.micro.amiga,net.unix-wizards
Subject: Re: Speed of seeks
Message-ID: <966@umd5.UUCP>
Date: Fri, 16-May-86 18:09:07 EDT
Article-I.D.: umd5.966
Posted: Fri May 16 18:09:07 1986
Date-Received: Sun, 18-May-86 15:26:02 EDT
References: <12593@ucla-cs.ARPA> <645@baylor.UUCP>
Reply-To: zben@umd5.UUCP (Ben Cranston)
Distribution: net
Organization: U of Md, CSC, College Park, Md
Lines: 41
Xref: linus net.micro.amiga:6896 net.unix-wizards:15065
Summary: Yet another SEEK implementation

In article <645@baylor.UUCP> peter@baylor.UUCP (Peter da Silva) writes:

>Incidentally, despite the poor design of the files a seek() does not have to
>read every sector... a mistake often made by library writers is to try to
>make seek offsets simple integers. According to the library, the argument
>to an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only needs
>to be the returned value from a tell() call: it may indeed be a magic cookie
>like a sector/offset pair (and in fact "magic cookie" is the way it's described
>in the manual). It is under RSX/11M and on the ATARI 800.

>This error is not restricted to relative newcomers: there's an IBM mainframe
>implementation of 'C' that copies all files into fixed record length files
>when you open them just so you can use UNIX-like seeks. If you want to do
>a UNIX-like seek, build UNIX-like files (either one long "record" or a bunch
>of maximum length records) so your offset calculations work. It's not
>meaningful to seek to an unknown depth in a text file or other weird file
>anyway.

The Software Tools NOTE/SEEK design uses two Fortran integers to store SEEK
addresses.  The predominant text data format on the Sperry 1100 system is a
variable length record, with the record length in a four byte header area.

My implementation of the Tools for the Sperry uses the first of the two
Fortran integers as the "character address within file" (i.e. 4 X wordaddr)
and the second Fortran integer as "character number within this record",
that is, how many characters back to go to get to the record header.  The
code uses this value to get "back in sync" after a random seek.

This has the advantage that the first word of the address appears to be a
normally-incrementing address, with 4-7 spaces between records.  It would
be possible to optimize NOTE address storage: if one knew that positions
stored would always be at the beginning of record and the file was always
ASCII one could keep just the first integer and supply "4" for the second.

Oh, and if the character code is "Fieldata" (tm) rather than ASCII then
the second word is negative.  For historical reasons only...

-- 
"We're taught to cherish what we have   |          Ben Cranston
 by what we have no longer..."          |          zben@umd2.umd.edu
                          ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben