Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!spool.mu.edu!munnari.oz.au!labtam!graeme
From: graeme@labtam.labtam.oz (Graeme Gill)
Newsgroups: comp.arch
Subject: Re: Bitfield instructions--a good idea?
Keywords: Graphics, Rendering
Message-ID: <10425@labtam.labtam.oz>
Date: 24 Apr 91 02:20:05 GMT
References: <1991Apr15.193425.3436@waikato.ac.nz> <2325@cluster.cs.su.oz.au>
Organization: Labtam Australia Pty. Ltd., Melbourne, Australia
Lines: 52

In article <2325@cluster.cs.su.oz.au>, rex@cs.su.oz (Rex Di Bona) writes:
> 
> This is true, but misleading. You do not want to convert 0 -> 00000000 and
> 1 -> 11111111, but 0-> some colour, and 1-> some other colour. Both of
> these colours are user selectable.

	The ideal graphics support would be an expand instruction with
foreground and background colour registers, but a 0 -> 00000000 and
1 -> 11111111 instruction would still be very useful when internal
operations are many times faster than memory accesses. The expanded
bitmap is used as a mask to merge the foreground and background colours
together (as well as a plane mask perhaps).  

> You should be able to do all of the rendering in software, keeping the
> intermediate values in registers. This requires a better layout of the
> memory for the graphics 'screen'. You could use a lookup table for additional
> speed (as an example, if your video memory was 8 bits deep, and layed out so
> that 4 pixels occupied a 32 bit word you could construct a 16 element
> table (each 32 bits wide) with the appropriate spaces filled in. To set
> up the table would require 16 memory stores, but you rendering would
> be twice as fast (one read, one write as opposed to 4 writes) as the byte
> at a time method). You could remove the read by having all 16 values stored
> in registers, and do a branch to the apppropriate store, or other hardware
> nasties.

	Umm. Packed frame stores are used because they speed up other very
important operations like fill and copy. Currently I do an expand operation
like this:

	Read 32 bits of source (word read)
	Lookup 8 bits of the source in the expand table (double word read)
	Lookup 8 bits of the source in the expand table (double word read)
	Write to the destination (quad word write).
	Lookup 8 bits of the source in the expand table (double word read)
	Lookup 8 bits of the source in the expand table (double word read)
	Write to the destination (quad word write).
	
With expand support I could do this:

	Read 32 bits of source (word read)
	Write to the destination (quad word write).
	Write to the destination (quad word write).

> I would suspect the additional overheads of determining when to use the 88K
> type bitfield operations would swamp the usefulness of them.

	When speed is important, these sort of routines are often pre-compiled
for various cases (eg. alignment, direction), so one selects the appropriate
routine, not instruction.

	Graeme Gill
	Labtam Australia