Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!spool.mu.edu!munnari.oz.au!labtam!graeme From: graeme@labtam.labtam.oz (Graeme Gill) Newsgroups: comp.arch Subject: Re: Bitfield instructions--a good idea? Keywords: Graphics, Rendering Message-ID: <10425@labtam.labtam.oz> Date: 24 Apr 91 02:20:05 GMT References: <1991Apr15.193425.3436@waikato.ac.nz> <2325@cluster.cs.su.oz.au> Organization: Labtam Australia Pty. Ltd., Melbourne, Australia Lines: 52 In article <2325@cluster.cs.su.oz.au>, rex@cs.su.oz (Rex Di Bona) writes: > > This is true, but misleading. You do not want to convert 0 -> 00000000 and > 1 -> 11111111, but 0-> some colour, and 1-> some other colour. Both of > these colours are user selectable. The ideal graphics support would be an expand instruction with foreground and background colour registers, but a 0 -> 00000000 and 1 -> 11111111 instruction would still be very useful when internal operations are many times faster than memory accesses. The expanded bitmap is used as a mask to merge the foreground and background colours together (as well as a plane mask perhaps). > You should be able to do all of the rendering in software, keeping the > intermediate values in registers. This requires a better layout of the > memory for the graphics 'screen'. You could use a lookup table for additional > speed (as an example, if your video memory was 8 bits deep, and layed out so > that 4 pixels occupied a 32 bit word you could construct a 16 element > table (each 32 bits wide) with the appropriate spaces filled in. To set > up the table would require 16 memory stores, but you rendering would > be twice as fast (one read, one write as opposed to 4 writes) as the byte > at a time method). You could remove the read by having all 16 values stored > in registers, and do a branch to the apppropriate store, or other hardware > nasties. Umm. Packed frame stores are used because they speed up other very important operations like fill and copy. Currently I do an expand operation like this: Read 32 bits of source (word read) Lookup 8 bits of the source in the expand table (double word read) Lookup 8 bits of the source in the expand table (double word read) Write to the destination (quad word write). Lookup 8 bits of the source in the expand table (double word read) Lookup 8 bits of the source in the expand table (double word read) Write to the destination (quad word write). With expand support I could do this: Read 32 bits of source (word read) Write to the destination (quad word write). Write to the destination (quad word write). > I would suspect the additional overheads of determining when to use the 88K > type bitfield operations would swamp the usefulness of them. When speed is important, these sort of routines are often pre-compiled for various cases (eg. alignment, direction), so one selects the appropriate routine, not instruction. Graeme Gill Labtam Australia