Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ucla-cs!zen!ucbvax!hplabs!analog!kim From: kim@analog.UUCP (Kim Helliwell) Newsgroups: comp.sys.mac Subject: Re: image rotation Message-ID: <602@analog.UUCP> Date: 16 Dec 87 18:43:19 GMT References: <600@analog.UUCP> <76000070@uiucdcsp> Organization: Analog Design Tools, Menlo Park, CA Lines: 62 Summary: I beg to differ In article <76000070@uiucdcsp>, gillies@uiucdcsp.cs.uiuc.edu writes: > > I think your analysis of the parallel bitblt algorithm is fallacious. > The speed of a BitBLT EVEN IN ASSEMBLY LANGUAGE is approx n*n/16 or > n*n/32, depending upon the word size of your machine. See [Deutsch84] > to learn how to dynamically "compile" a bitblt request into machine > language, then execute it like a bat out of hell, a type of self > modifying (self-compiling?) algorithm. For the particular mac rotation > problem, you could hand-code each of the 1..9 bitblt operations. > > ... > ... > > (n*n/16)*log(2,512) == (n*n/2) time. > > Thus the blitter algorithm should *always* outperform the > simple-minded bit-by-bit transfer algorithm on the macintosh. I think you are the one with the fallacious argument. Factors like 16 or log(some constant) are irrelevant for purposes of analyzing CPU time requirements. My original analysis was sketchy, but here it is in a slightly expanded form: Suppose that I wish to rotate an N X N bitmap. If I do it the brute-force way, it will take N*N operations (where an operation is defined as a memory transfer with a masking operation thrown in somewhere (to get at the right bit of a particular byte. Ok, have it your way--to get at the right bit of a long word). If I use the recursive bitBLT version, it will take log(N) steps (this is obviously a log base 2), and each step consists of several bitblt operations. I counted 15 for the version I was using, but I suppose that this could be reduced for by some clever rewrite of the blit operation. I won't quibble about the factor of 15--it's irrelevent when compared to the powers of N we are working with. So let's suppose that we somehow magically manage to do the whole thing with 1 blit per step(!) What is it we are doing? We are moving some bytes (OK, longwords) from one place to another in memory, again with a masking operation thrown in. In the real algorithm, many of the blit's involved moving only half or a quarter of the full bitmap, but again this is a factor that is irrelevant. I figure that I gave up the factor of 15, so it's only fair for you to concede the factor of 4 or 2, and we'll say for the purposes of argument that the one blit per recursion involves the whole bitmap, which is N*N/(8,16,or 32). The constant factors are ignorable by the principal already enunciated, so I get the the blit version is O(N*N*log(N)). Why do we ignore the constant factors? Because, although in specific cases you could show that for certain values of N, and for specific versions of the two algorithms, that the blit version would win, the thing that is relevant is what happens as N increases. At some point, N*N*log(N) will become bigger than N*N, NO MATTER WHAT CONSTANT FACTORS MULTIPLY THEM. That is why there is so much discussion of sorting in the literature discussing relative merits of various algorithms for various sizes of problem--often a less overall efficient algorithm for a specific case is a win because the size of the problem and the speed of the comparison fits the method better. BTW, the brute force method wins in another way. That is, it can be used to rotate non-square bitmaps and non-integral values of N, which can be a big win if you want to rotate a long narrow strip (which I did).