Xref: utzoo comp.graphics:3936 comp.windows.x:6832 Path: utzoo!utgpu!watmath!onfcanim!dave From: dave@onfcanim.UUCP (Dave Martindale) Newsgroups: comp.graphics,comp.windows.x Subject: Re: Luminance from RGB Message-ID: <16960@onfcanim.UUCP> Date: 20 Dec 88 16:00:32 GMT References: <8241@pasteur.Berkeley.EDU> <518@midgard.Midgard.MN.ORG> <16929@onfcanim.UUCP> <7086@venera.isi.edu> Reply-To: dave@onfcanim.UUCP (Dave Martindale) Organization: National Film Board / Office national du film, Montreal Lines: 46 In article <7086@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes: > >>But it's faster to *implement* almost any function using table lookup. > > This is true for relatively complex functions, but not usually > for those that break down easily to simple operations such as > shifts and adds. I've measured speed improvements up to a > factor of 14 over ordinary C code in the most extreme case > by moving a critical algorithm to assembly language and using > this sort of shifty logic. Is that a factor of 14 speed improvement over table lookup? Or a factor of 14 improvement over code that contained multiply or divide in the inner loop? A comparison against table lookup is what matters. What processor was this on? For shift/add to be faster than table lookup, the time required to do the shifts and adds must be less than that required to do the table addressing calculations and the extra memory fetch. (Note that the table word will usually be in the cache on machines that have a cache.) On machines like the VAX, where the table addressing calculations can be done as part of a "move" instruction but the shifts and adds are done as separate instructions (and shift is very slow on some models), table lookup is going to be faster. On another machine where the table addressing requires several instructions but a barrel shifter is available, the shift method will likely be faster. You have to try both, or have a good knowledge of the particular model of the particular architecture of processor you are using, to determine which is faster. However, table lookup has some additional benefits. It is simple enough that carefully-written C code is likely to generate the best possible assembler code, so there is no need to use assembler. A single copy of the lookup code functions for all possible input and output widths, as long as the pixels are always stored in words of a constant width. In contrast, the shift/add code requires different sequences of instructions for different input and output pixel bit widths, since the output may require adding 1 or 2 or 3 or more copies of the input, shifted by various amounts. Either you must pre-compile all possible variations that you could ever need, or compile code during execution, or just have a general-purpose algorithm that needs tests and branches within the pixel lookup loop (bye-bye performance). How do you deal with this?