Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!gem.mps.ohio-state.edu!usc!apple!sun-barr!newstop!sun!pepper!cmcmanis From: cmcmanis%pepper@Sun.COM (Chuck McManis) Newsgroups: comp.sys.amiga Subject: Re: 3000 wishes Message-ID: <125964@sun.Eng.Sun.COM> Date: 6 Oct 89 22:28:03 GMT References: <4875@cps3xx.UUCP> Sender: news@sun.Eng.Sun.COM Reply-To: cmcmanis@sun.UUCP (Chuck McManis) Organization: Sun Microsystems, Mountain View Lines: 100 In article <4875@cps3xx.UUCP> porkka@frith.UUCP (Joe Porkka) writes: > Make sprites work in hires, and allow them to be as wide as a playfield; > make them as deep (bitplane wise) too... make it so that you can have 32 > indipendent sprites per scan line. I designed one hardware graphics systems, and helped with another that was part of a multiperson group that was working on the Intel 82786. Both had something similar to this, and both attacked the problem in different ways. The difficulties that come up are similar though. Sprites and windows can be thought of as a memory management problem. One linear space (the viewscreen) may be composed of several discrete chunks of a larger workspace. On a pixel by pixel basis you get to decide where that pixel will come from in the workspace. Fortunately, you can make some optimizations because you know that pixels will be accessed in sequential order. The problem is access time for the translation tables. Since a scan line may be as short as 14 microseconds (for a non interlaced 1K X 1K display at 66Hz) you need to do pixel translations in as few as 14 nanoseconds. And if you can do 14 nanosecond translations then you can have an arbitrary number of windows aligned on arbitrary boundaries on your screen. Now however if you want to do "sprites" which can be "transparent" you may do your translation, only to find out that the sprite you translated two has a transparent pixel, and now you have to find the pixed "under" it. If you used up your 14 nanoseconds getting to the first sprite, your hosed because the beam will move on. Anyway, it isn't this bad at NTSC rates. With a 640 X (200/400) screen and a 15Khz scan rate, you only have to map pixels within 99 nanoseconds. So visualize the following scene at the pixel multiplexor : Beam Position X Y | | V V sprite 0 ------\ sprint 1 ------\\ sprite 2 ------\\\ +-----+ sprite 3 -------\\\\ +-----+ | +---> Red sprite 4 ------------+ MUX +---+ DAC +---> Grn sprite 5 -------//// +-----+ | +---> Blu sprite 6 -------/// +-----+ sprite 7 -------// playfield ------/ So the MUX or some sort of arbitration circuit has to lookup the pixel color of sprite 0, and if it's transparent fall through to sprite 1, ..., to sprite 7 and then finally pick up the playfield data. All within the 99ns the beam has to find that information. Common ways to cheat are to "freeze" the values and start queueing up stuff from memory when HBLANK hits, and while you get behind in fetching stuff you started out ahead, so that the beam just catches up to you when you hit the next HBLANK. So an expensive way to do this might be to put each window in the "proper" place in it's own bank of VRAMs. [You might be able to multiplex windows that didn't overlap like VSprites with clever programming.] Then you scan all banks of VRAM simultaneously for data. In the display unit you simply keep a bunch of address comparators that hold the LeftEdge, TopEdge, RightEdge, and BottomEdge values, all ANDed together so that they generate a "1" bit when the beam is in that "window". Since the propgation time on these comparators is pretty fast (like 10ns) we don't have to worry about that. If you are clever and want to make them sprite like, you can put a "zero" detect in to AND with the comparator output and that would pull the "we're in this window" bit down if the pixel at that location was zero. Now you divide the pixel clock into 4 subclocks (each ~25ns in this case) and time it like this : 0 1 2 3 _______ _______ _______ _______ clock / \_______/ \_______/ \_______/ \_______ ________________________________________________________ inwin -------<________________________________________________________>- ________________________________________ iszero -----------------------<________________________________________>- ________________________ is_top ---------------------------------------<________________________>- ________________ valid_pixel -------------------------------------------<________________>- So if you can read my crude timing diagram, everything latches on the falling edge of C0 (4X pixel clock) and that ends up that by the rising edge of phase 3 you can clock the "true" pixel onto the video shifter bus and then out to the dacs. Note that only on the falling edge of phase 2 will you have an accurate picture of which pixel is "topmost" this from an arbitration of priorities between the falling edge of phase 1 and before the falling edge of phase 2. That means you have to arrive at the correct priority in about 25ns, given a setup time of 5ns and a settling time of 3 - 4ns, you have to keep those propogation times down. You can probably do this with a XOR priority encoder scheme. Anyway, for 32 "window/screen/sprites" you will need 32 banks of VRAM (again this will be the maximum number of windows on a line, if you can live with fewer windows/line you could reduce that.) Assuming 8 bit pixels, (this is an improvement after all) and a 640 X 200+ screen you will need 512KB of VRAM for each window, leaving you with 16MB of VRAM for the display. Which is definitely doable but it will get a bit expensive. Interestingly enough on a monochrome screen you only need 2MB of VRAM, and that would make for a pretty awesome X terminal or some such. --Chuck McManis uucp: {anywhere}!sun!cmcmanis BIX: cmcmanis ARPAnet: cmcmanis@sun.com These opinions are my own and no one elses, but you knew that didn't you. "If I were driving a Macintosh, I'd have to stop before I could turn the wheel."