Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!brianw From: brianw@microsoft.UUCP (Brian WILLOUGHBY) Newsgroups: comp.sys.apple Subject: Re: Really small question Message-ID: <10071@microsoft.UUCP> Date: 25 Dec 89 08:27:56 GMT References: <9542@microsoft.UUCP> <742@batman.moravian.EDU> Reply-To: brianw@microsoft.UUCP (Brian WILLOUGHBY) Organization: Microsoft Corp., Redmond WA Lines: 63 nicholaA@batman.moravian.EDU (Andy Nicholas) writes: >I thought the cycle times on MVN/MVP were 7 cycles per byte moved. How >is that as fast as DMA which is supposed to be (at least what I've always >been told) 1 cycle per byte moved? Have you compared the speeds in an actual coding situation? As soon as I figured out how to assemble 16 bit opcodes using Merlin macros, the first 16 bit program I wrote to use my new W65C802 was a full HGR screen move in each of the available methods. I had an 8 bit move loop, a 16 bit move loop (which used X and Y as sixteen bit pointers into memory), and a MVN instruction. I repeated each move 16 times, so that my slow human perception could get a handle on how long the process was taking. Using alternating full screens of black and white, it was VERY easy to see that MVN was clearly the fastest. I coded the fastest 16 bit move I could think of, using LDA 00,X - with X as a 16 bit offset, the actual address was not in the Zero Page, but using the Zero Page (now Direct Page) addressing mode shaved an extra cycle off of every loop iteration. There was no mistaking it, the MVN was just as much an improvement over the 16 bit move loop as the 16 bit move was over the 8 bit move. This is on a Plus, but after I got a TransWarp I was faced with the same slow video cycles as the GS. Still the MVN method won. >Generally, MVN/MVP is sort of a slow way to do things... or at least thats >what most of the GS graphics gurus will tell you. :-) Well, for generating graphics screens from multiple smaller images (instead of moving the entire graphics screen as a single unit), MVN doesn't offer many advantages. Than again, neither does the standard DMA move (as if it were available on an Apple :-). This is because writing a shape - or a window, or any object smaller than the width of the graphics screen - to the video memory is not a simple move with a single start address and length. What you always end up with is several shorter moves to each individual scan line. With moves that are shorter than 40 bytes (using the HGR screen as an example), the advantage of MVN or MVP are not so great - and besides, there is so much room for optimization in video routines that the static MVN instruction is just not flexible enough. Add to this the consideration that many plotting routines might need to rotate bits within a byte in order to plot at different locations, and the MVN becomes even less useful. I believe that you have *graphics* gurus telling you that MVN/MVP is slow for *their* purposes, but these instructions are faster than a loop based move algorithm for simple block moves of large areas of memory. Do you think that the Western Design Center engineers had nothing better to do one day than to create a totally useless instruction? They could have left these two opcodes open for future expansion. The 7 cycles is instruction setup time - the move occurs at a rate of 1 cycle per byte. Side note: the video DMA circuitry in the Amiga has a start address, length AND a scan line pitch value (address difference between two pixels located at the same X position on the screen). For the Amiga, moving square areas on the video screen (like, say, windows) is super fast. Plus, their bit-blitter does the bit rotations that make Apple graphics programmers choose hand-coded loops over block moves. This is the kind of hardware I'd like to see in the GS! Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP