Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!mips!ptimtc!nntp-server.caltech.edu!toddpw
From: toddpw@nntp-server.caltech.edu (Todd P. Whitesel)
Newsgroups: comp.sys.apple2
Subject: Re: Animation
Keywords: Animation
Message-ID: <1991Apr17.095436.10764@nntp-server.caltech.edu>
Date: 17 Apr 91 09:54:36 GMT
References: <1991Apr17.061057.22357@cs.uow.edu.au>
Organization: California Institute of Technology, Pasadena
Lines: 72

u9050728@cs.uow.edu.au (Shane Kelvin Richards) writes:

[ stuff deleted ]

>       My question is, is my basic ideas/techniquie correct? Am I using
>the wrong method for fast shape manipulation. OR am I using the correct
>method and I should just try to improve upon my code and optimise where
>I can?

Your method is reasonable, but the time-wasters are pretty obvious. Read on.

>      For simplicity I only let shapes move by 2 pixels so that they 
>always fall on a byte boundary. Also, I am usin the 320x200 resolution.

Time-waster #1: loops. If you are looping through the picture data and the
mask then you are spending a non-trivial amount of time in the loop overhead.
Unrolling consists of coding a long string of instructions with the offsets
hardcoded as the addresses; the index register(s) are used to hold the low
word of the data address. You can do truly evil things this way if you map
the SHR buffer to the stack (better disable interrupts temporarily though!):

	lda	0,x	;dp points to object location on screen
	and	|0,y	;DBR/Y points to mask
	ora	|$1000,y	;suppose the image is 4K past the mask
	sta	0,x
	lda	1,x
	and	|0,y
	ora	|$1000,y
	sta	1,x
	...

Note that the above example does assume the mask and image start at a fixed
distance from each other. It is a speed vs. memory tradeoff.

Time-waster #2: rectangular objects. Depending on the types of objects you want
to animate, it may actually help to pack the image and its mask so that dead
space in the object rectangle is replaced by offset/length values for each
line of the object. This is almost always a win.

Time-waster #3: the mask itself. If you can afford to let the mask be per
byte and not per pixel, you can get even more speed but at real memory
expense -- you hardcompile each object into code that draws it by simply
storing it (using the index w/ hardcoded offset technique from above).
If you want EVEN MORE speed you can use the stack to push bytes directly
onto the picture (this looks sick but is actually pretty easy to do once
you know what's involved). What's cool about stack-romping is that you
can push arbitrary words with PEA's, repeat values and one-byte values
with pha/phx/phy, and skip bytes with a sbc #xxxx; tcs; sequence (if you
let A accumulate the hops that is -- a simple way to do this would be to
pass the location of the object as the byte address of its last byte, so
the object draw code can start with a tcs). The major drawback here is
that you have hardcompiled code PER OBJECT -- I haven't tried to do this
yet but I suspect that the code is about as large as the image & mask data
so you are losing a bit of mask resolution but not much else.

Time-waster #4: the shadowing itself. If you are going to be drawing over
objects a lot then you should turn off shadowing while you are drawing the
scene and then turn it back on and do a single romp copy of the bank 1 SHR
buffer onto itself -- this can be done by remapping memory, the stack & dp,
and issuing a series of
	pei $fe
	pei $fc
	...
	pei $2
	pei $0
and hopping the dp register after each page.

I am not positive but I strongly suspect that both #3 and #4 are used by
the FTA Space Harrier demo.

Todd Whitesel
toddpw @ tybalt.caltech.edu