Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!brianw From: brianw@microsoft.UUCP (Brian WILLOUGHBY) Newsgroups: comp.sys.apple Subject: Re: Really small question (a really long explanation) Message-ID: <10041@microsoft.UUCP> Date: 22 Dec 89 02:11:58 GMT References: <9542@microsoft.UUCP> <1989Dec15.200302.8233@ncsuvx.ncsu.edu> Reply-To: brianw@microsoft.UUCP (Brian Willoughby) Organization: Microsoft Corp., Redmond WA Lines: 87 rnf@shumv1.ncsu.edu (Rick Fincher) writes: >brianw@microsoft.UUCP (Brian Willoughby) writes: >>rnf@shumv1.ncsu.edu (Rick Fincher) writes: >>>You can move the data by turning shadowing on then LDA and STA each >>>word back to its original location. This puts the data in bank E1 >>>and is faster than the memory moves you were talking about, if you >>>keep your loop overhead low. >> >>There are a couple of ways of doing this. You could reload the Data Bank >>register before doing the bank $00 prep, and then the stuff would already be >>in bank $E1. I think you would do LDA #01 (or $E1), PHA, PLB. After selecting >> >> [mention using MVN instruction] > >If you write directly into $E1 you do so at 1mhz. The mvn instruction is >fast but because of the way the writes to $E1 are slowed to 1mhz I think >it is still faster to just read a word into a 16 bit register and write >it back to the same location. No bank boundaries are crossed sio extra cycles >are added for that, and shadowing lets the hardware do the actual copies. I >think the Apple guys added up all of the cycles and determined that this was >the fastest way to do this (Matt, Dave?). Nope, nothing comes for free. Writes (but not reads) to banks $00 or $01 occur at the same speed as writes to $E0/$E1 as long as shadowing is on (provided that you are accessing the addresses set aside for video). The "Apple guys" only allowed shadowing so that ][+ and //e programs would still function, even though these programs are unaware that video memory has been moved to $E0/$E1. Thus, it was a compatibility issue, not a speed issue. I don't think that there is a case (for a GS-specific program) where shadowing allows faster execution times. For a non-GS program it just wouldn't work without shadowing. Fortunately, shadowing doesn't cause writes OUTSIDE of the video areas to be slowed. If you still prefer shadowing, then you could save time by causing the MVN instruction to move back to the same location (source == destination). A hand-coded loop will always be slower than MVN, except for cases where a different kind of move is needed, such as an I/O move where you keep read/writing the same address from/to a memory buffer. (i.e. reading from a single SCSI port address into a memory buffer.) Thus, the only limitation of MVN (or MVP) is that BOTH the source and destination addresses must be changing. EXPLANATION: Only one cycle of any direct write to $E1 is at 1 MHz, the rest of the cycles for that instruction are at full speed. This is a limitation because the video circuitry is using the $E0/E1 RAM banks at 1 MHz for 50% of the time, and the CPU can only "get in" on regular intervals during the other 50%. The Mac also suffers from the same limitation (except for the SE/030 which has dual port RAM. OK Apple, when do we see this technology in a ][?). There is hardware in the GS to "stretch" any cycle which accesses the video memory, based on the address generated by the CPU. Fortunately there are two sets of RAM banks, so it is possible to write to both at the same time with shadowing on. Here is the catch: if you have shadowing on, then you are technically writing into video memory and the CPU still slows down for that cycle. There is no magical way of sneaking past this requirement because the whole system must synchronize to the video memory. If the hardware didn't wait for the video write to complete, then there would be a possibility that the CPU would do a 16 bit write at 2.8 MHz to bank $01 with shadowing on, and the second byte would have nowhere to go because at 2.8 MHz the first byte would not yet be written to the 1 MHz video memory. 1 MHz clock ----------------- actual ----------------- actual ------------ | Video read | Write byte 1 | Video read | Write byte 2 | Video read --- ----------------- ----------------- 2 MHz (I didn't want to try to illustrate 2.8 MHz!) --------- --------- --------- --------- --------- | Write | 1 | Write | 2 | | | | | | --- --------- --------- --------- --------- ---- The first write attempt conflicts with video access to $Ex, and so it is delayed. The second write is impossible unless the 2 MHz CPU clock is stretched to sync up with the 1 MHz video timing. P.S. Hey Rick, do you remember we met at the NCSU Computing Center back when you used to work there? I was attending NCSU at the time and it was my first exposure to the GS. Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP