Newsgroups: comp.sys.cbm Path: utzoo!utgpu!jarvis.csri.toronto.edu!godzilla.eecg.toronto.edu!leblanc From: leblanc@eecg.toronto.edu (Marcel LeBlanc) Subject: Re: SEQ file access speedup Message-ID: <89Feb14.171816est.2394@godzilla.eecg.toronto.edu> Summary: Interleave is important for standard routines on 1541/4040/8050 Keywords: Interleave, fast SEQ access Organization: EECG, University of Toronto References: <89Feb10.182100est.2732@godzilla.eecg.toronto.edu> <7124@killer.DALLAS.TX.US> <7143@killer.DALLAS.TX.US> Date: Tue, 14 Feb 89 17:18:03 EST In article <7143@killer.DALLAS.TX.US> elg@killer.Dallas.TX.US (Eric Green) writes: >This is the results of benchmarking > >a) loading, and >b) doing GETIN until EOF, from ML, doing nothing inbetween. > ... >My basic thought was that sequential file access can take place just >as fast as LOAD'ing. The benchmark confirms that for IEEE drives and >the standard 1541. There's a couple of constraints here. First of all, ... >clrchn/chkin/clrchn/chkout for each individual byte -- a 4-to-1 >overhead). When you do that, SEQ access isn't slow at all.. just look >at these timings: > >C-64, IEEE Flash, SFD-1001, 'load"bbs",8' : 18 seconds > """""""""""""""""""""""" ML loop, seq read: 18 seconds ^^^^ >C-64, C-LINK II, SFD-1001 LOAD: 14 seconds > READ: 14 seconds ^^^^ Yes, times are identical. Please read on. >in 64 mode, with 1571: load -- 60 seconds > read -- 62 seconds Almost identical. This supports what I said in my original posting on this subject. Here's an excerpt: ... as much of a speed increase as is possible on LOADs. This has less to do with the transfer protocol, than with the LOW PERFORMANCE limitations of the C64 kernal. To remain compatible with ... As you pointed out in an earlier posting, the standard C64 load routine does nothing but repeatedly call ACPTR! This is a LOW PERFORMANCE limitation when you have a decent transfer protocol, but it's of no importance when you have to use the standard serial protocol of the C64! But then you say SFD-1001 (IEEE interface) isn't low performance? It is reasonably fast, but since they just speed up ACPTR/CIOUT without changing the LOAD routine, the ML loop that you have written should give the same results as LOAD (since it's basically the same loop), and it does. This DOESN'T mean that SEQ read is as fast as block transfers (LOAD), it just means that you have to optimize ("speed up") the block transfer software as well as the transfer protocol. This is even better demonstrated by the following numbers: >128 Ramdos: READ: 9 seconds >64 Ramdos: READ: 8 seconds AND, as you stated before, LOAD is virtually instantaneous! >128 -- 1571 -- load -- 8 seconds > read -- 26 seconds >64 mode, with Epyx fastload cart. -- > LOAD 26 seconds > with Mike J. Henry's "fastboot v2": 26 seconds Doesn't it seem unusual that C128 fast serial, Epyx FastLoad, and Mike Henry's fastboot all take the same amount of time (26 secs)? [This really isn't intended to sound like a flame.] Here's what you wrote earlier in the article: >All tests were done with a 98 block file consisting of the main >body of a BBS program. It was just the handiest program that I had >available on both SFD and 1541 formats. I put it onto a blank 1541 >disk, to prevent fragmentation. It was already first on the SFD disk >From the numbers listed above, I would guess that you copied from the SFD-1001 to a _1571_. Unless you set the interleave yourself (using "U0>"+chr$(interleave#)), the 1571 saves using a 6 sector interleave, even when it's in 1541 mode. The C128 burst mode can easily keep up with a 6 sector interleave, but Mike Henry's fastboot needs at least 8 sectors to decode and transfer, and Epyx FastLoad needs at least 10 (the 1541 standard). On a fresh disk, the program would be saved near the directory track. On this part of the disk, I think the sectors/track is 18. Since FastLoad, fastboot V2, and C128 fast serial can't keep up with an interleave of 6 sectors, they are forced to wait a full revolution or 18+6 = 24 sectors! The interleave forces a speed difference of 24/6 = 4 times slowdown! Of course, in a 98 block file, not all sectors can be stored exactly 6 apart, so this is just a GOOD approximation. This is very close to the 26 sec/ 8 sec ratio (3.25) given by the above numbers. Weren't we talking about SEQ file speed up? :-) What I'm getting at is that 1541/71 and SFD-1001 aren't good drives to use when studying byte-at-a-time transfer overhead. That's because the dos in those drives only buffers a sector at a time, which forces it to use an interleave scheme. It's possible to get around this (and Super Snapshot V4 does, for LOAD only), but I haven't seen an implementation yet that attempts to do this for SEQ file accesses. And since the transfer times involved in this performance range (about 4-5 secs for 100 blocks, far beyond standard CBM IEEE) are less than the overhead for byte-at-a-time transfers, you wouldn't be able to get close to LOAD speedup for SEQ accesses. Here's my speedup summary: (by "State of the Art" I mean 1541 interleave INDEPENDENT serial fast loaders and C128 burst mode with optimal interleave, NOT IEEE.) std blocks non-std blocks A. "State of the Art" LOAD 12-15x 20-25x B. not yet attempted, "State of the Art" READ 6-7x (guess) 8-9x (guess) C. Classical Fast I/O LOAD 5-6x n.a. e.g. Epyx FastLoad (interleave = 10) D. Classical fast I/O READ 3-4x (guess) n.a. E. Standard LOAD & READ 1x n.a. The IEEE interfaces that various people have discussed so far probably fit in with "C". This is only because they are using the standard load routine with faster ACPTR (to get any speedup they would have to SAVE with a tighter interleave or execute custom LOAD routines within the IEEE device, but I doubt that any of the IEEE owners on the net would want to have anything to do with this :-) ). A good way to see byte-at-a-time overhead is to use RAMDOS or a 1581, which buffers half a track (one physical cylinder, not a full logical track). >Unfortunately I couldn't see if the Super Snapshot was faster than the >Epyx or fastboot product. My brother sold ours because it was ... >... "it wasn't any faster than the fastload cartridge" (his words, SS V1 and SS V2 were "classical" fast loader implementations, so the speed was only marginally faster than Epyx FastLoad (5.5x vs. 5x). The actual transfer routines were significantly faster, but the 10 sector interleave of the 1541 limited all these products to the same speed range. The marginal speedup came from significantly faster head stepping routines. You could SAVE at a different interleave to get some extra speedup, but it wasn't usually worth the trouble. SS V3 and SS V4 use a MUCH faster interleave independent technique. The speedup over Epyx FastLoad and similar products is very noticeable. >Some trivia: the main difference between LOAD'ing (burst mode) and >READ'ing (fastmode) on the 1571 is that fast mode negotiates a >transaction for each byte, while burst mode negotiates on a per-block >basis. Burst mode is unique in that manner -- even the IEEE drives >negotiate on a per-byte basis (probably why they're slower than burst >mode, despite fairly equivalent hardware). I agree, per-block is the only way to get great speed. You have probably noticed that the Burst mode examples in the 1571 user's manual avoid using subroutine calls to get each byte as it arrives. With transfer rates in the range used by Burst mode, this could slow you down. However, it turns out that there's a fair bit of time to waste at the bit rate that CBM decided to use for Burst mode. >Some other trivia: Using ACPTR should be faster than using GETIN, if >subroutine overhead is as big a problem as some hint. GETIN has to >do all sorts of testing to see where to dispatch to -- is it keyboard, >or is it disk? This overhead should be noticible when compared to >LOAD, which calls ACPTR directly. But for both the IEEE drives and the >1541, there was no significant difference between LOAD and GETIN >times, implying that transfer speed, and not internal Kernal overhead, >was the limitation. Again, this just shows how slow the standard ACPTR routine is, and how important interleave limitations are no matter how fast the transfer protocol is. Once you have overcome the limitations of interleave, either by buffering whole tracks or by doing other nasty manipulations :-), the real speed of the transfer protocol can really shine. After all, IEEE interfaces should be capable of much faster transfers. For those who don't believe that interleave is as important as I've said, try the following: Create a file on a 1541, then compare the time required to LOAD it using a classical fastloader like Epyx FastLoad with the time required to SCRATCH the file. You should get the same results from a C128 with a 1571 (using burst mode vs. SCRATCH). SCRATCH has to follow the chain of sectors that are used in the file. Since the only transfers involved in SCRATCH are internal, all the time required is to follow the 10 sector interleaved chain (6 if you're using a 1571). I think this posting was already too long about half way through! :-) Marcel A. LeBlanc | University of Toronto -- Toronto, Canada leblanc@eecg.toronto.edu | also: LMS Technologies Ltd, Fredericton, NB, Canada ------------------------------------------------------------------------------- UUCP: uunet!utai!eecg!leblanc BITNET: leblanc@eecg.utoronto (may work) ARPA: leblanc%eecg.toronto.edu@relay.cs.net CDNNET: <...>.toronto.cdn