Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!crdgw1!dollar!barnettj
From: barnettj@dollar.crd.ge.com (Janet A Barnett)
Newsgroups: comp.sys.amiga.tech
Subject: Re: Using CPU instead of Blitter for speed
Message-ID: <8256@crdgw1.crd.ge.com>
Date: 6 Jun 90 19:17:05 GMT
References: <2984@crash.cts.com> <1990Jun3.164446.12193@ameristar> <1990Jun4.134811.12142@watdragon.waterloo.edu> <3009@crash.cts.com>
Sender: news@crdgw1.crd.ge.com
Distribution: comp
Organization: General Electric Corporate R&D Center
Lines: 56

In article <3009@crash.cts.com> jcs@crash.cts.com (John Schultz) writes:
>  Unfortunately, with a bitplane oriented display, you only get parallelism
>on your last bitplane blit. True, you can compute you next blit values before
>waiting, but that buys you little time.

What about the graphics library routine QBlit(blitnode)?  I set up all
my blits ahead of time in a linked list pointed to by the blitnode
argument to QBlit().  The blitnode contains a pointer to a routine to
call that handles the actual blitter register stuffing.  QBlit adds my
blits to a queue maintained in GfxBase; when the blitter is ready, THE
OPERATING SYSTEM CALLS my blitter routines.  So what, you say, if the
OS waits, how is that any better than doing my own WaitBlits?  The
trick is that the QBlit routine makes use of the blitter_done
interrupt. This means that the CPU is free to go about other business
until each blit is done, at which point the interrupt service kicks
in.  By setting the CLEANUP flag in the blitnode structure, a special
routine can be called when the list of blits is exhusted.  In my
stuff, the CLEANUP routine usually sends a message to the task that
launched the blits so it can know when the blits are complete.  Seems
ideal.

Of course, everything has caveats.  Obviously, there is more overhead
in QBlit than OwnBlitter/WaitBlit/DisownBlitter, but hopefully you can
make use of the time you don't spend in WaitBlit.  Further, if your
blits are small, it is possible that one interrupt will start several
blits in a row, probably resulting in no real net improvement from the
parallel nature of the coprocessor.  And, I read once, long ago, that
there was a problem with WaitBlit in a heavily loaded system. Could
this manifest itself with QBlit?  I don't know.  My tests of
multitasking with QBlit have consisted of doing DIRs of DF0: at the
same time my graphics are running.  Result? Both processes go slow,
but the blitter seems to be shared correctly.

Consider also QBlit's sibling QBSBlit.  Set the beam sync element of
your blitnode to a vertical scan-line position and the OS will attempt
to set an interrupt based on the 60Hz CIA timer such that your bliiter
routine is called after the e-beam has passed the indicated scan-line.
Semiuseful if you're being niggling with display memory and you don't
mind the occasional glitch when the CPU can't get to your blit in
time.  (When you're makeing star hash out of some viscous,
evil-smelling alien, a little extra sparkle in the debris generally
goes unnoticed.)  See the AutoDocs for more info.

So, even though the blitter may be slower than a 68030, it still
represents a powerful resource when used properly.  By the way, which
processor will draw a line faster?  The blitter (once started) can set
a single pixel in a line every 6 7MHz-bus cycle (1.17E6 pixels/sec).
A line algorithm in Steve Williams "68030 Assembly Language Reference"
shows about 50 instructions for implementing the Bresenham Line
Algorithm.  Unfortunately, this otherwise excellent reference has no
instruction timings, but I'll be generous and allow 4 cycles for each
instruction.  At 28MHz, this gives us about .14E6 pixels/sec.  Hmmm.
Maybe a 68040 could do better.

(See Tomas Rokicki's BlitLab for an explanation of how to draw lines
with the blitter.)