Path: utzoo!attcan!uunet!samsung!uakari.primate.wisc.edu!sdd.hp.com!ucsd!helios.ee.lbl.gov!nosc!crash!jcs From: jcs@crash.cts.com (John Schultz) Newsgroups: comp.sys.amiga.tech Subject: Re: Using CPU instead of Blitter for speed Message-ID: <3035@crash.cts.com> Date: 7 Jun 90 18:15:23 GMT References: <1990Jun4.134811.12142@watdragon.waterloo.edu> <3009@crash.cts.com> <8256@crdgw1.crd.ge.com> Distribution: comp Organization: Crash TimeSharing, El Cajon, CA Lines: 118 X-Local-Date: 7 Jun 90 11:15:23 PDT In article <8256@crdgw1.crd.ge.com> barnettj@dollar.crd.ge.com (Janet A Barnett) writes: >In article <3009@crash.cts.com> jcs@crash.cts.com (John Schultz) writes: >> Unfortunately, with a bitplane oriented display, you only get parallelism >>on your last bitplane blit. True, you can compute you next blit values before >>waiting, but that buys you little time. > >What about the graphics library routine QBlit(blitnode)? I set up all >my blits ahead of time in a linked list pointed to by the blitnode >argument to QBlit(). The blitnode contains a pointer to a routine to I wrote my own blitter interrupt code to clear the screen in rectanglar chunks as opposed to a linear clear of all memory. The interrupt version was slower. The problem with queueing up blits is that your application my not be able to do anything else until the blits are done. In a flight simulator, you must draw all of your polygons in the correct order. This means that the polygons must be transformed, clipped and sorted first, then all of the drawing must take place. Further, the processor is used to draw points. The points can't be drawn until the blitter is finished. I was going to put processor drawing code in the the blitter interrupt code, but the payoff of using blitter interrupts was too little to continue. >Of course, everything has caveats. Obviously, there is more overhead >in QBlit than OwnBlitter/WaitBlit/DisownBlitter, but hopefully you can >make use of the time you don't spend in WaitBlit. Further, if your >blits are small, it is possible that one interrupt will start several >blits in a row, probably resulting in no real net improvement from the >parallel nature of the coprocessor. And, I read once, long ago, that >there was a problem with WaitBlit in a heavily loaded system. Could WaitBlit() shouldn't have any problems. If you roll your own, you must read a chip memory or hardware location before testing the blit done bit, as described in the 1.3 Amiga Hardware Manual. >this manifest itself with QBlit? I don't know. My tests of >multitasking with QBlit have consisted of doing DIRs of DF0: at the >same time my graphics are running. Result? Both processes go slow, >but the blitter seems to be shared correctly. > >Consider also QBlit's sibling QBSBlit. Set the beam sync element of >your blitnode to a vertical scan-line position and the OS will attempt >to set an interrupt based on the 60Hz CIA timer such that your bliiter >routine is called after the e-beam has passed the indicated scan-line. >Semiuseful if you're being niggling with display memory and you don't >mind the occasional glitch when the CPU can't get to your blit in >time. (When you're makeing star hash out of some viscous, >evil-smelling alien, a little extra sparkle in the debris generally >goes unnoticed.) See the AutoDocs for more info. Some of us purr-fectionists will notice, and exclaim, "Hack!" :-). Double or triple buffering should handle any animation situation. >So, even though the blitter may be slower than a 68030, it still >represents a powerful resource when used properly. By the way, which You are true. Currently, for block moves, the blitter says to the processor, "You can't touch this!" >processor will draw a line faster? The blitter (once started) can set >a single pixel in a line every 6 7MHz-bus cycle (1.17E6 pixels/sec). >A line algorithm in Steve Williams "68030 Assembly Language Reference" >shows about 50 instructions for implementing the Bresenham Line >Algorithm. Unfortunately, this otherwise excellent reference has no >instruction timings, but I'll be generous and allow 4 cycles for each >instruction. At 28MHz, this gives us about .14E6 pixels/sec. Hmmm. >Maybe a 68040 could do better. Heh, heh. That example doesn't work. It draws nice 45 degree lines, and that's it. I use a high performance fixed point method derived from 68000 Assembly Language, by Krantz and Stanley. Take a look at the main loop (this is for a four bitplane display, 40 bytes per row, a2-a5 are the bitplane ptrs): PMD [YAAD] Program Module Dismemberer V.14 Copyright ) 1990, HoweSoft,Inc. All Rights Reserved. * Program_unit #0 name "". * Code_Hunk [PUBLIC] #0 Length = 60 bytes [15 longwords] move.l D4,D0 ; 4 CYCLES move.l D5,D1 ; 4 CYCLES add.l A0,D0 ; 8 CYCLES add.l A1,D1 ; 8 CYCLES swap D0 ; 4 CYCLES swap D1 ; 4 CYCLES move.w D1,D3 ; 4 CYCLES add.w D1,D1 ; 4 CYCLES add.w D1,D1 ; 4 CYCLES add.w D1,D3 ; 4 CYCLES lsl.w #3,D3 ; 12 CYCLES move.w D0,D1 ; 4 CYCLES lsr.w #3,D0 ; 12 CYCLES add.w D0,D3 ; 4 CYCLES andi.w #$07,D1 ; 8 CYCLES not.b D1 ; 4 CYCLES bset D1,$00(A2,D3.W) ; 18 CYCLES bset D1,$00(A3,D3.W) ; 18 CYCLES bset D1,$00(A4,D3.W) ; 18 CYCLES bset D1,$00(A5,D3.W) ; 18 CYCLES add.l D6,D4 ; 8 CYCLES add.l D7,D5 ; 8 CYCLES dbf D2,-$38(PC) ; 10+ CYCLES rts ; 16 CYCLES * 206 Total Cycles * Hunk_End. So, what does this work out to on a cached 030? Tough call, so I tested it real time against the blitter. It's faster for small lines and slightly slower for very long lines. If we had faster processor access to chip ram, we could really cook. >(See Tomas Rokicki's BlitLab for an explanation of how to draw lines >with the blitter.) Also, the 1.3 Hardware Manual explains how to draw lines with the blitter, as well as example code. John