Path: utzoo!attcan!uunet!samsung!uakari.primate.wisc.edu!sdd.hp.com!ucsd!helios.ee.lbl.gov!nosc!crash!jcs
From: jcs@crash.cts.com (John Schultz)
Newsgroups: comp.sys.amiga.tech
Subject: Re: Using CPU instead of Blitter for speed
Message-ID: <3035@crash.cts.com>
Date: 7 Jun 90 18:15:23 GMT
References: <1990Jun4.134811.12142@watdragon.waterloo.edu> <3009@crash.cts.com> <8256@crdgw1.crd.ge.com>
Distribution: comp
Organization: Crash TimeSharing, El Cajon, CA
Lines: 118
X-Local-Date: 7 Jun 90 11:15:23 PDT

In article <8256@crdgw1.crd.ge.com> barnettj@dollar.crd.ge.com (Janet A Barnett) writes:
>In article <3009@crash.cts.com> jcs@crash.cts.com (John Schultz) writes:
>>  Unfortunately, with a bitplane oriented display, you only get parallelism
>>on your last bitplane blit. True, you can compute you next blit values before
>>waiting, but that buys you little time.
>
>What about the graphics library routine QBlit(blitnode)?  I set up all
>my blits ahead of time in a linked list pointed to by the blitnode
>argument to QBlit().  The blitnode contains a pointer to a routine to

  I wrote my own blitter interrupt code to clear the screen in rectanglar
chunks as opposed to a linear clear of all memory. The interrupt version
was slower. The problem with queueing up blits is that your application
my not be able to do anything else until the blits are done. In a 
flight simulator, you must draw all of your polygons in the correct order.
This means that the polygons must be transformed, clipped and sorted first,
then all of the drawing must take place. Further, the processor is used
to draw points. The points can't be drawn until the blitter is finished.
I was going to put processor drawing code in the the blitter interrupt
code, but the payoff of using blitter interrupts was too little to 
continue. 

>Of course, everything has caveats.  Obviously, there is more overhead
>in QBlit than OwnBlitter/WaitBlit/DisownBlitter, but hopefully you can
>make use of the time you don't spend in WaitBlit.  Further, if your
>blits are small, it is possible that one interrupt will start several
>blits in a row, probably resulting in no real net improvement from the
>parallel nature of the coprocessor.  And, I read once, long ago, that
>there was a problem with WaitBlit in a heavily loaded system. Could

  WaitBlit() shouldn't have any problems. If you roll your own, you
must read a chip memory or hardware location before testing the blit
done bit, as described in the 1.3 Amiga Hardware Manual.

>this manifest itself with QBlit?  I don't know.  My tests of
>multitasking with QBlit have consisted of doing DIRs of DF0: at the
>same time my graphics are running.  Result? Both processes go slow,
>but the blitter seems to be shared correctly.
>
>Consider also QBlit's sibling QBSBlit.  Set the beam sync element of
>your blitnode to a vertical scan-line position and the OS will attempt
>to set an interrupt based on the 60Hz CIA timer such that your bliiter
>routine is called after the e-beam has passed the indicated scan-line.
>Semiuseful if you're being niggling with display memory and you don't
>mind the occasional glitch when the CPU can't get to your blit in
>time.  (When you're makeing star hash out of some viscous,
>evil-smelling alien, a little extra sparkle in the debris generally
>goes unnoticed.)  See the AutoDocs for more info.

  Some of us purr-fectionists will notice, and exclaim, "Hack!" :-).
Double or triple buffering should handle any animation situation.

>So, even though the blitter may be slower than a 68030, it still
>represents a powerful resource when used properly.  By the way, which

  You are true. Currently, for block moves, the blitter says to the
processor, "You can't touch this!"

>processor will draw a line faster?  The blitter (once started) can set
>a single pixel in a line every 6 7MHz-bus cycle (1.17E6 pixels/sec).
>A line algorithm in Steve Williams "68030 Assembly Language Reference"
>shows about 50 instructions for implementing the Bresenham Line
>Algorithm.  Unfortunately, this otherwise excellent reference has no
>instruction timings, but I'll be generous and allow 4 cycles for each
>instruction.  At 28MHz, this gives us about .14E6 pixels/sec.  Hmmm.
>Maybe a 68040 could do better.

  Heh, heh.  That example doesn't work. It draws nice 45 degree lines, and
that's it.  I use a high performance fixed point method derived from
68000 Assembly Language, by Krantz and Stanley. 
Take a look at the main loop (this is for a four bitplane display,
40 bytes per row, a2-a5 are the bitplane ptrs):

PMD [YAAD] Program Module Dismemberer V.14
Copyright ) 1990, HoweSoft,Inc. All Rights Reserved.
*  Program_unit #0 name "<UNNAMED>".
*  Code_Hunk [PUBLIC] #0 Length = 60 bytes [15 longwords]
	move.l	D4,D0				; 4	CYCLES
	move.l	D5,D1				; 4	CYCLES
	add.l	A0,D0				; 8	CYCLES
	add.l	A1,D1				; 8	CYCLES
	swap	D0				; 4	CYCLES
	swap	D1				; 4	CYCLES
	move.w	D1,D3				; 4	CYCLES
	add.w	D1,D1				; 4	CYCLES
	add.w	D1,D1				; 4	CYCLES
	add.w	D1,D3				; 4	CYCLES
	lsl.w	#3,D3				; 12	CYCLES
	move.w	D0,D1				; 4	CYCLES
	lsr.w	#3,D0				; 12	CYCLES
	add.w	D0,D3				; 4	CYCLES
	andi.w	#$07,D1				; 8	CYCLES
	not.b	D1				; 4	CYCLES
	bset	D1,$00(A2,D3.W)			; 18	CYCLES
	bset	D1,$00(A3,D3.W)			; 18	CYCLES
	bset	D1,$00(A4,D3.W)			; 18	CYCLES
	bset	D1,$00(A5,D3.W)			; 18	CYCLES
	add.l	D6,D4				; 8	CYCLES
	add.l	D7,D5				; 8	CYCLES
	dbf	D2,-$38(PC)			; 10+	CYCLES
	rts					; 16	CYCLES
* 206 Total Cycles
*  Hunk_End.


  So, what does this work out to on a cached 030? Tough call, so I
tested it real time against the blitter. It's faster for small lines
and slightly slower for very long lines. If we had faster processor
access to chip ram, we could really cook.

>(See Tomas Rokicki's BlitLab for an explanation of how to draw lines
>with the blitter.)

  Also, the 1.3 Hardware Manual explains how to draw lines with the blitter,
as well as example code.


  John