Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!brianw
From: brianw@microsoft.UUCP (Brian WILLOUGHBY)
Newsgroups: comp.sys.apple
Subject: Re: Really small question (a really long explanation)
Message-ID: <10041@microsoft.UUCP>
Date: 22 Dec 89 02:11:58 GMT
References: <kZTLjCG00WB7Q=4bJa@andrew.cmu.edu> <9542@microsoft.UUCP> <1989Dec15.200302.8233@ncsuvx.ncsu.edu>
Reply-To: brianw@microsoft.UUCP (Brian Willoughby)
Organization: Microsoft Corp., Redmond WA
Lines: 87

rnf@shumv1.ncsu.edu (Rick Fincher) writes:
>brianw@microsoft.UUCP (Brian Willoughby) writes:
>>rnf@shumv1.ncsu.edu (Rick Fincher) writes:
>>>You can move the data by turning shadowing on then LDA and STA each 
>>>word back to its original location.  This puts the data in bank E1
>>>and is faster than the memory moves you were talking about, if you
>>>keep your loop overhead low.
>>
>>There are a couple of ways of doing this.  You could reload the Data Bank
>>register before doing the bank $00 prep, and then the stuff would already be
>>in bank $E1.  I think you would do LDA #01 (or $E1), PHA, PLB. After selecting
>>
>> [mention using MVN instruction]
>
>If you write directly into $E1 you do so at 1mhz.  The mvn instruction is
>fast but because of the way the writes to $E1 are slowed to 1mhz I think
>it is still faster to just read a word into a 16 bit register and write
>it back to the same location.  No bank boundaries are crossed sio extra cycles
>are added for that, and shadowing lets the hardware do the actual copies.  I
>think the Apple guys added up all of the cycles and determined that this was
>the fastest way to do this (Matt, Dave?).

Nope, nothing comes for free.  Writes (but not reads) to banks $00 or $01 occur
at the same speed as writes to $E0/$E1 as long as shadowing is on (provided
that you are accessing the addresses set aside for video).  The "Apple guys"
only allowed shadowing so that ][+ and //e programs would still function, even
though these programs are unaware that video memory has been moved to $E0/$E1.
Thus, it was a compatibility issue, not a speed issue.  I don't think that
there is a case (for a GS-specific program) where shadowing allows faster
execution times.  For a non-GS program it just wouldn't work without shadowing.
Fortunately, shadowing doesn't cause writes OUTSIDE of the video areas to be
slowed.

If you still prefer shadowing, then you could save time by causing the MVN
instruction to move back to the same location (source == destination).  A
hand-coded loop will always be slower than MVN, except for cases where a
different kind of move is needed, such as an I/O move where you keep
read/writing the same address from/to a memory buffer.  (i.e.  reading from a
single SCSI port address into a memory buffer.)  Thus, the only limitation of
MVN (or MVP) is that BOTH the source and destination addresses must be
changing.

EXPLANATION:
Only one cycle of any direct write to $E1 is at 1 MHz, the rest of the cycles
for that instruction are at full speed.  This is a limitation because the video
circuitry is using the $E0/E1 RAM banks at 1 MHz for 50% of the time, and the
CPU can only "get in" on regular intervals during the other 50%.  The Mac also
suffers from the same limitation (except for the SE/030 which has dual port
RAM.  OK Apple, when do we see this technology in a ][?).

There is hardware in the GS to "stretch" any cycle which accesses the video
memory, based on the address generated by the CPU.  Fortunately there are two
sets of RAM banks, so it is possible to write to both at the same time with
shadowing on.

Here is the catch: if you have shadowing on, then you are technically writing
into video memory and the CPU still slows down for that cycle.  There is no
magical way of sneaking past this requirement because the whole system must
synchronize to the video memory.  If the hardware didn't wait for the video
write to complete, then there would be a possibility that the CPU would do a
16 bit write at 2.8 MHz to bank $01 with shadowing on, and the second byte
would have nowhere to go because at 2.8 MHz the first byte would not yet be
written to the 1 MHz video memory.

1 MHz clock
  -----------------  actual       -----------------  actual       ------------
  | Video read    | Write byte 1  | Video read    | Write byte 2  | Video read  
---               -----------------               -----------------

2 MHz (I didn't want to try to illustrate 2.8 MHz!)
  ---------       ---------       ---------       ---------       ---------
  | Write | 1     | Write | 2     |       |       |       |       |       |
---       ---------       ---------       ---------       ---------       ----

The first write attempt conflicts with video access to $Ex, and so it is
delayed.  The second write is impossible unless the 2 MHz CPU clock is
stretched to sync up with the 1 MHz video timing.

P.S. Hey Rick, do you remember we met at the NCSU Computing Center back when
you used to work there?  I was attending NCSU at the time and it was my first
exposure to the GS.

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP