Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!pdn!oz.paradyne.com!alan From: alan@oz.paradyne.com Newsgroups: comp.arch Subject: BitBlt, new instructions for RISC. Message-ID: <7466@pdn.paradyne.com> Date: 23 Feb 90 20:32:33 GMT Sender: usenet@pdn.paradyne.com Reply-To: alan@oz.paradyne.com () Organization: AT&T Paradyne, Largo, Florida Lines: 78 In an email message to me, bbd@rice.edu wrote: >In article <7398@pdn.paradyne.com> I wrote: >>Another important case is when drawing vertical lines. >>Remember, 32 * 32 = 1024; it only takes 32 32-bit words to store a full line >>of monochrome pixels on a megapixel (1024x1024) screen. >[On most systems, doesn't this special case occur for horizontal, >rather than vertical lines? My next 2 paragraphs assume so...] >The line only fits into the 32 words if it is going in the proper (eg. >horizontal) direction. If the line goes the other way (eg. vertical), >it takes 1024 32-bit words to store it, if it's stored in the obvious >way. To which I attempted to reply via email. However: ----- Transcript of session follows ----- >>> RCPT To: <<< 550 ... User unknown 550 rice.edu!bbd... User unknown So here is my reply: I apologize: I did not realize how easy it would be to misinterpret what I wrote. I did not mean to imply that pixels which are vertically sequential are also stored sequentially in memory. The point of my comment was how "narrow" the screen is in terms of words of memory, to make it graphically :-) obvious that many blits fall within a single word or two. The point about vertical lines is that they always (unless they, or the pixels themselves, are very thick) fall within one or two words horizontally. >>Also, with burst-mode cache refils and sequential word accesses, the cache >>miss penalty is not that significant: you only "miss" one word in sixteen >>(typically). >Eh? If I'm reading a character glyph from a bitmap that is >approximately the size of the screen, then each row of the glyph is >separated by about 30 32-bit words, right? That would play hell with >burst-mode refilled caches, right? Again, I apologize: my comment was addressed to the big bitblt case that the original poster was concerned about. However, be advised that character glyphs may be stored one character glyph per character (so that the entire glyph bitmap fits into a single cache line), or perhaps one whole font face per glyph. In any case, characters are normally blitted in a loop which writes a whole string to the screen, often soon followed by another. So it is likely that the most frequently used characters would have their pixels in the cache already. >>then the total loop cycle count is 8, an 11% improvement. Isn't it normally >>considered worthwile to add an instruction if it improves performance by as >>little as 2 or 3 percent? >Perhaps it's worthwhile if it improves overall performance by 2 or 3%. >You've only (theoretically) improved performance of the _blitter_ by >8% or 11%. Will the "bmerge" (or whatever) instruction be used >anywhere else? Sure, a compiler could be _taught_ to use it, but it >won't bother to emit this instruction unless the magical pattern "(Rx >& Ry) | (Rz & ~Ry)" shows up in the intermediate code. Where else would >this occur beside BitBlt? BMERGE would be useful for dealing with unaligned accesses (remember those?). That's essentially what BitBlt does: unaligned accesses. Most CPUs are at least byte aligned, if not half-word (68000) or word aligned (RISC). BitBlt is not necessary on a bit-aligned CPU. >> Don't people spend mongo$ to increase the cache size for no more than >> a 10% performance improvement? >Again, I think these mongo$ are going for an across-the-board 10% >performance improvement. Well, suppose you want to market your CPU as a graphics engine? ____"Congress shall have the power to prohibit speech offensive to Congress"____ Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. Mottos: << Many are cold, but few are frozen. >> << Frigido, ergo sum. >>