Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!att!cbnewsk!cbnewsj!dwex From: dwex@cbnewsj.att.com (david.e.wexelblat) Newsgroups: comp.sys.3b1 Subject: Re: Replacement for wind.o Keywords: MGR Message-ID: <1991May6.152805.10583@cbnewsj.att.com> Date: 6 May 91 15:28:05 GMT References: <1991May3.163220.24448@cbnewsj.att.com> <1991May4.062447.7923@yenta.alb.nm.us> Organization: AT&T Bell Laboratories Lines: 64 In article <1991May4.062447.7923@yenta.alb.nm.us> dt@yenta.alb.nm.us (David B. Thomas) writes: [stuff deleted] > > 3. My new job has me writing bit blit routines in assembly languages all day > long. What's one more? I'm going to code all of mgr's bitblits in 68010 > assembler and get this baby cookin'. > > little david > -- > Unix is not your mother. Are you away of the loop-mode instructions for the 68010? They are discussed on the last few pages of the 68000-68008-68010 book from Motorola. I did some testing, and for long copies (> ~100 bytes) they are a whole lot faster. Apparently the compiler doesn't use them. I wrote a memcpy()-type routine, and compiled it with and without the optimizer, and it did not use these instructions. The libc.a versions do use them, so either these were hand-coded in assembler, were hand optimized, used a different compiler, or I'm missing something. The MGR bitblt could be sped up a log just by using these instructions. The way they work (this is from memory; my book is at home) is as follows. Given a normal copy function: for (i=100; i > 0; i--) *dest++ = *src++; the compiler outputs something like: mov.l &100,%d0 mov.l dest,%a0 mov.l src,%a1 top: mov.b (%a1)+,(%a0)+ sub.l &1,%d0 bgt top Convert this to mov.l &100,%d0 mov.l dest,%a0 mov.l src,%a1 top: mov.b (%a1)+,(%a0)+ dbf %d0,top and the 68010 read this as loop mode (due to its prefetch), and does not fetch the move or branch instructions again, saving 4 memory accesses (1 for mov.b, 1 for sub.l, and 2 for bgt). This is a big win. Note that it only works for branches with a negative displacement of 4 (i.e. one instruction before the dbxx), which happens to be ideal for copies. Anyhow, I thing this would make a huge improvement to MGR, since it showed me approx 10 times the performance on a quick 1000-byte-copy benchmark. Check it out. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- David Wexelblat | dwex@mtgzz.att.com | I asked her her name. AT&T Bell Laboratories | ...!att!mtgzz!dwex | She said her name was 200 Laurel Ave - 4B-421 | | 'Maybe' Middletown, NJ 07748 | (201) 957-5871 | --Damn Yankees