Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!lll-crg!nike!oliveb!glacier!navajo!ali
From: ali@navajo.STANFORD.EDU (Ali Ozer)
Newsgroups: net.micro.amiga
Subject: File random access times.
Message-ID: <804@navajo.STANFORD.EDU>
Date: Tue, 26-Aug-86 20:58:58 EDT
Article-I.D.: navajo.804
Posted: Tue Aug 26 20:58:58 1986
Date-Received: Wed, 27-Aug-86 20:58:24 EDT
Reply-To: ali@navajo.ARPA (Ali Ozer)
Organization: Stanford University
Lines: 92

[]

Using the Manx compiler, version 3.20a, I did some timings on random
file accesses. Here are some of the interesting results:

(The file was 128 kilobytes long. The sequence was repeated 10 times, and the
 results given are the total time for the 10 iterations. 1.1 and 1.2 refer
 to the versions of Kickstart/Workbench used. (1.2 was beta 6.) "Buf" means
 buffered I/O (using fopen, fseek, fread), "Unbuf" unbuffered I/O (using
 open, lseek, and read). After every seek, the indicated number of characters
 were read from the file. The timings to do not include the time to open and
 close the files.)

   Sequence of seeks (ie,    Number of 
   locations accessed)       chars read    Buf1.1  Unbuf1.1  Buf1.2  Unbuf1.2
   --------------------------------------------------------------------------
1. 0                                 0       0.4       0.3     0.4       0.3
                                     1       5.2       0.0     5.2       0.0
                                    10       5.5       0.0     5.5       0.0
                                  1000       6.1       5.2     6.1       5.2
   ---------------------------------------------------------------------------
2. 60000                             0       1.2       0.9     0.7       0.3
                                     1      14.0       0.2     5.7       0.0
                                    10      14.9       0.2     6.0       0.0
                                  1000      15.5      13.7     6.6       0.3
   ---------------------------------------------------------------------------
3. 120000                            0       1.6       1.0     0.8       0.4
                                     1      11.9       0.4     0.2       0.1
                                    10      12.8       0.4     0.2       0.1
                                  1000      13.4      11.9     0.8       0.2
   ---------------------------------------------------------------------------
4. 0, 1000, 2000                     0       1.1       1.1     8.8       8.7
                                     1       8.9       0.2     9.0       8.8
                                    10       9.1       0.2     9.0       8.8
                                  1000      11.0       8.8    10.9       9.1
   ---------------------------------------------------------------------------
5. 120000, 121000, 122000            0       3.5      25.6     1.2       0.7
                                     1      31.4      26.0     0.8       0.4
                                    10      32.4      26.0     0.8       0.4
                                  1000      34.2      26.2     2.6       0.8
   ---------------------------------------------------------------------------
6. 0, 10000, 20000                   0       1.1       1.2     9.1       9.1
                                     1      11.9       0.2    12.1       9.1
                                    10      12.2       0.3    12.1       9.1
                                  1000      14.0      11.9    14.0      12.2
   ---------------------------------------------------------------------------
7. 100000, 110000, 120000            0       8.8       8.9     9.5       9.6
                                     1      41.0       7.3    12.5       9.3
                                    10      42.6       7.3    12.5       9.2
                                  1000      44.4      38.1    14.3       9.7
   ---------------------------------------------------------------------------

From the above results, it seems like 1.2's seek functions are much more 
uniform across the file, meaning it takes about the same time to access
location 0 vs. location 100000, as rows labelled "6." and "7." show. But,
take a look at rows "4." and "5.". Accessing locations around 120000
is much faster than accessing locations around 0... In fact, 1.1 is faster
in accessing locations 0, 1000, 2000, while 1.2 is faster accessing locations
120000, 121000, 122000...  Seems pretty random (although I do get the same 
results over and over.)

In almost all cases, using unbuffered I/O seems to be faster. Even for purely
sequential reading (results not posted above), it seems like if call
read() with character count of 1000 or so, unbuffered I/O is faster
(I guess in this case it becomes equivalent to doing your own buffering.)

One final confusion in the above table is what happens if you read zero
characters. As rows "1.", "2.", and "3." show, sometimes reading zero
characters results in worse performance than reading 1 or even 10...
(more in the unbuffered case than in the buffered case).  Thinking maybe
read() is screwing up and actually reading some characters, I put an "if"
to prevent the call to read in cases where the number of chars was 0. Still,
I get the same results! The psuedo code is simply...

    get current tick
    do 10 times
      do the seek, complain if unsuccessful
      if (# of chars to read != 0) read, complaining if unsuccessful 
    get current tick, print time difference

I don't understand at all why reading NO characters should take more time than
actually reading characters...  I'll be happy to post the code I used, if
anyone is interested, but seems like I'm doing the right thing. Any ideas?

Of course, I guess there would be faster ways of doing disk I/O, with low
level Dos functions (which I haven't learned about yet --- there's so much to
read, so much to learn...). I would be 
interested in seeing what people have to say about the fastest way of reading
random sizes of data (say 0-1k) in from random locations in a 
large (>100k) file is...

Ali Ozer, Ali@Score.stanford.edu