Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!brianw
From: brianw@microsoft.UUCP (Brian WILLOUGHBY)
Newsgroups: comp.sys.apple
Subject: Re: Really small question
Message-ID: <10071@microsoft.UUCP>
Date: 25 Dec 89 08:27:56 GMT
References: <kZTLjCG00WB7Q=4bJa@andrew.cmu.edu> <9542@microsoft.UUCP> <742@batman.moravian.EDU>
Reply-To: brianw@microsoft.UUCP (Brian WILLOUGHBY)
Organization: Microsoft Corp., Redmond WA
Lines: 63

nicholaA@batman.moravian.EDU (Andy Nicholas) writes:
>I thought the cycle times on MVN/MVP were 7 cycles per byte moved.  How
>is that as fast as DMA which is supposed to be (at least what I've always
>been told) 1 cycle per byte moved?

Have you compared the speeds in an actual coding situation?

As soon as I figured out how to assemble 16 bit opcodes using Merlin macros,
the first 16 bit program I wrote to use my new W65C802 was a full HGR screen
move in each of the available methods.  I had an 8 bit move loop, a 16 bit move
loop (which used X and Y as sixteen bit pointers into memory), and a MVN
instruction.  I repeated each move 16 times, so that my slow human perception
could get a handle on how long the process was taking.  Using alternating full
screens of black and white, it was VERY easy to see that MVN was clearly the
fastest.

I coded the fastest 16 bit move I could think of, using LDA 00,X - with X as a
16 bit offset, the actual address was not in the Zero Page, but using the Zero
Page (now Direct Page) addressing mode shaved an extra cycle off of every loop
iteration.

There was no mistaking it, the MVN was just as much an improvement over the 16
bit move loop as the 16 bit move was over the 8 bit move.  This is on a Plus,
but after I got a TransWarp I was faced with the same slow video cycles as the
GS.  Still the MVN method won.

>Generally, MVN/MVP is sort of a slow way to do things... or at least thats
>what most of the GS graphics gurus will tell you.  :-)

Well, for generating graphics screens from multiple smaller images (instead of
moving the entire graphics screen as a single unit), MVN doesn't offer many
advantages.  Than again, neither does the standard DMA move (as if it were
available on an Apple :-).  This is because writing a shape - or a window, or
any object smaller than the width of the graphics screen - to the video memory
is not a simple move with a single start address and length.  What you always
end up with is several shorter moves to each individual scan line.  With moves
that are shorter than 40 bytes (using the HGR screen as an example), the
advantage of MVN or MVP are not so great - and besides, there is so much room
for optimization in video routines that the static MVN instruction is just not
flexible enough.  Add to this the consideration that many plotting routines
might need to rotate bits within a byte in order to plot at different
locations, and the MVN becomes even less useful.

I believe that you have *graphics* gurus telling you that MVN/MVP is slow for
*their* purposes, but these instructions are faster than a loop based move
algorithm for simple block moves of large areas of memory.  Do you think that
the Western Design Center engineers had nothing better to do one day than to
create a totally useless instruction?  They could have left these two opcodes
open for future expansion.  The 7 cycles is instruction setup time - the move
occurs at a rate of 1 cycle per byte.

Side note: the video DMA circuitry in the Amiga has a start address, length
AND a scan line pitch value (address difference between two pixels located at
the same X position on the screen).  For the Amiga, moving square areas on the
video screen (like, say, windows) is super fast.  Plus, their bit-blitter does
the bit rotations that make Apple graphics programmers choose hand-coded loops
over block moves.  This is the kind of hardware I'd like to see in the GS!

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP