Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-crg!nike!oliveb!glacier!navajo!ali From: ali@navajo.STANFORD.EDU (Ali Ozer) Newsgroups: net.micro.amiga Subject: File random access times. Message-ID: <804@navajo.STANFORD.EDU> Date: Tue, 26-Aug-86 20:58:58 EDT Article-I.D.: navajo.804 Posted: Tue Aug 26 20:58:58 1986 Date-Received: Wed, 27-Aug-86 20:58:24 EDT Reply-To: ali@navajo.ARPA (Ali Ozer) Organization: Stanford University Lines: 92 [] Using the Manx compiler, version 3.20a, I did some timings on random file accesses. Here are some of the interesting results: (The file was 128 kilobytes long. The sequence was repeated 10 times, and the results given are the total time for the 10 iterations. 1.1 and 1.2 refer to the versions of Kickstart/Workbench used. (1.2 was beta 6.) "Buf" means buffered I/O (using fopen, fseek, fread), "Unbuf" unbuffered I/O (using open, lseek, and read). After every seek, the indicated number of characters were read from the file. The timings to do not include the time to open and close the files.) Sequence of seeks (ie, Number of locations accessed) chars read Buf1.1 Unbuf1.1 Buf1.2 Unbuf1.2 -------------------------------------------------------------------------- 1. 0 0 0.4 0.3 0.4 0.3 1 5.2 0.0 5.2 0.0 10 5.5 0.0 5.5 0.0 1000 6.1 5.2 6.1 5.2 --------------------------------------------------------------------------- 2. 60000 0 1.2 0.9 0.7 0.3 1 14.0 0.2 5.7 0.0 10 14.9 0.2 6.0 0.0 1000 15.5 13.7 6.6 0.3 --------------------------------------------------------------------------- 3. 120000 0 1.6 1.0 0.8 0.4 1 11.9 0.4 0.2 0.1 10 12.8 0.4 0.2 0.1 1000 13.4 11.9 0.8 0.2 --------------------------------------------------------------------------- 4. 0, 1000, 2000 0 1.1 1.1 8.8 8.7 1 8.9 0.2 9.0 8.8 10 9.1 0.2 9.0 8.8 1000 11.0 8.8 10.9 9.1 --------------------------------------------------------------------------- 5. 120000, 121000, 122000 0 3.5 25.6 1.2 0.7 1 31.4 26.0 0.8 0.4 10 32.4 26.0 0.8 0.4 1000 34.2 26.2 2.6 0.8 --------------------------------------------------------------------------- 6. 0, 10000, 20000 0 1.1 1.2 9.1 9.1 1 11.9 0.2 12.1 9.1 10 12.2 0.3 12.1 9.1 1000 14.0 11.9 14.0 12.2 --------------------------------------------------------------------------- 7. 100000, 110000, 120000 0 8.8 8.9 9.5 9.6 1 41.0 7.3 12.5 9.3 10 42.6 7.3 12.5 9.2 1000 44.4 38.1 14.3 9.7 --------------------------------------------------------------------------- From the above results, it seems like 1.2's seek functions are much more uniform across the file, meaning it takes about the same time to access location 0 vs. location 100000, as rows labelled "6." and "7." show. But, take a look at rows "4." and "5.". Accessing locations around 120000 is much faster than accessing locations around 0... In fact, 1.1 is faster in accessing locations 0, 1000, 2000, while 1.2 is faster accessing locations 120000, 121000, 122000... Seems pretty random (although I do get the same results over and over.) In almost all cases, using unbuffered I/O seems to be faster. Even for purely sequential reading (results not posted above), it seems like if call read() with character count of 1000 or so, unbuffered I/O is faster (I guess in this case it becomes equivalent to doing your own buffering.) One final confusion in the above table is what happens if you read zero characters. As rows "1.", "2.", and "3." show, sometimes reading zero characters results in worse performance than reading 1 or even 10... (more in the unbuffered case than in the buffered case). Thinking maybe read() is screwing up and actually reading some characters, I put an "if" to prevent the call to read in cases where the number of chars was 0. Still, I get the same results! The psuedo code is simply... get current tick do 10 times do the seek, complain if unsuccessful if (# of chars to read != 0) read, complaining if unsuccessful get current tick, print time difference I don't understand at all why reading NO characters should take more time than actually reading characters... I'll be happy to post the code I used, if anyone is interested, but seems like I'm doing the right thing. Any ideas? Of course, I guess there would be faster ways of doing disk I/O, with low level Dos functions (which I haven't learned about yet --- there's so much to read, so much to learn...). I would be interested in seeing what people have to say about the fastest way of reading random sizes of data (say 0-1k) in from random locations in a large (>100k) file is... Ali Ozer, Ali@Score.stanford.edu