Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ucla-cs!zen!ucbvax!hplabs!analog!kim
From: kim@analog.UUCP (Kim Helliwell)
Newsgroups: comp.sys.mac
Subject: Re: image rotation
Message-ID: <602@analog.UUCP>
Date: 16 Dec 87 18:43:19 GMT
References: <600@analog.UUCP> <76000070@uiucdcsp>
Organization: Analog Design Tools, Menlo Park, CA
Lines: 62
Summary: I beg to differ

In article <76000070@uiucdcsp>, gillies@uiucdcsp.cs.uiuc.edu writes:
> 
> I think your analysis of the parallel bitblt algorithm is fallacious.
> The speed of a BitBLT EVEN IN ASSEMBLY LANGUAGE is approx n*n/16 or
> n*n/32, depending upon the word size of your machine.  See [Deutsch84]
> to learn how to dynamically "compile" a bitblt request into machine
> language, then execute it like a bat out of hell, a type of self
> modifying (self-compiling?) algorithm.  For the particular mac rotation
> problem, you could hand-code each of the 1..9 bitblt operations.
> 
> ...
> ...
> 
> (n*n/16)*log(2,512) == (n*n/2) time.
> 
> Thus the blitter algorithm should *always* outperform the
> simple-minded bit-by-bit transfer algorithm on the macintosh.

I think you are the one with the fallacious argument.  Factors like
16 or log(some constant) are irrelevant for purposes of analyzing
CPU time requirements.

My original analysis was sketchy, but here it is in a slightly expanded
form:

	Suppose that I wish to rotate an N X N bitmap.  If I do it the
brute-force way, it will take N*N operations (where an operation is
defined as a memory transfer with a masking operation thrown in somewhere (to
get at the right bit of a particular byte.  Ok, have it your way--to get at
the right bit of a long word).

If I use the recursive bitBLT version, it will take log(N) steps (this is
obviously a log base 2), and each step consists of several bitblt operations.
I counted 15 for the version I was using, but I suppose that this could
be reduced for by some clever rewrite of the blit operation.  I won't quibble
about the factor of 15--it's irrelevent when compared to the powers of
N we are working with.  So let's suppose that we somehow magically manage
to do the whole thing with 1 blit per step(!)  What is it we are doing?
We are moving some bytes (OK, longwords) from  one place to another in
memory, again with a masking operation thrown in.  In the real algorithm,
many of the blit's involved moving only half or a quarter of the full
bitmap, but again this is a factor that is irrelevant.  I figure that
I gave up the factor of 15, so it's only fair for you to concede the 
factor of 4 or 2, and we'll say for the purposes of argument that the
one blit per recursion involves the whole bitmap, which is N*N/(8,16,or 32).
The constant factors are ignorable by the principal already enunciated, so
I get the the blit version is O(N*N*log(N)).

Why do we ignore the constant factors?  Because, although in specific
cases you could show that for certain values of N, and for specific versions
of the two algorithms, that the blit version would win, the thing that is
relevant is what happens as N increases.  At some point, N*N*log(N) will
become bigger than N*N, NO MATTER WHAT CONSTANT FACTORS MULTIPLY THEM.
That is why there is so much discussion of sorting in the literature
discussing relative merits of various algorithms for various sizes of 
problem--often a less overall efficient algorithm for a specific case is
a win because the size of the problem and the speed of the comparison
fits the method better.

BTW, the brute force method wins in another way.  That is, it can be used
to rotate non-square bitmaps and non-integral values of N, which can be
a big win if you want to rotate a long narrow strip (which I did).