Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker!bloom-beacon!EXPO.LCS.MIT.EDU!keith From: keith@EXPO.LCS.MIT.EDU (Keith Packard) Newsgroups: comp.windows.x Subject: Re: The easiest way to speed up X11R4 Message-ID: <9002081626.AA21959@xenon.lcs.mit.edu> Date: 8 Feb 90 16:26:23 GMT References: <8968@portia.Stanford.EDU> Sender: daemon@athena.mit.edu (Mr Background) Organization: The Internet Lines: 46 > Suppose I wanted to spend, say, a day making X11R4 faster for my > particular hardware. What would be the best way to spend my time? > I am interested in 8 bit color speedups at the expense of portability. As usual, this depends almost completely on what you will be using the server for. The R4 server is actually pretty good at a wide range of common tasks. > The README for the cfb directory says the cfb code is "very slow". The README file was not updated for R4. Look at the CHANGES file and you'll see a more promising comment: "This directory now provides a real implementation for 8-bit frame buffers, driving the frame buffer at memory bandwidth for many operations" The most heavily tuned operations are BitBlt, text painting, line drawing and rectangle filling. For these operations, you'd be hard pressed to get much performance increase even coding them in assembly. > For those interested in receiving a copy, I'll be doing this for > the MC68030. The R4 server has code which is tuned for the 68020 family; it allows you to specify a few machine characteristics which guide the compiler to the correct bits of code. > Even more specifically, it will be for the Macintosh II. This is the bad news. The Mac II frame buffer cards which sit on the NuBus have memory latency of ~1us per access; and no special block-mode optimizations which give them a bandwidth of 4Mb/sec (4 bytes/access). Nothing the R4 server does can help this out; you end up with a server which runs 2.5times slower than a Sun 3/60. Some of the newer Macintoshes have on-board frame buffers. I expect that eliminating the NuBus would give them more respectable frame buffer latency numbers, but I haven't ever had a chance to bench mark them running our code. If you want to start tuning the R4 code, get a copy of x11perf and start measuring. As usual, any changes you make would be welcome at MIT if directed at 'xbugs@expo.lcs.mit.edu'. Keith Packard MIT X Consortium