Path: utzoo!attcan!uunet!husc6!bloom-beacon!tut.cis.ohio-state.edu!rutgers!att!mtunb!dmt From: dmt@mtunb.ATT.COM (Dave Tutelman) Newsgroups: comp.sys.ibm.pc Subject: Screen-writing speed & snow elimination Keywords: BIOS snow display screen Message-ID: <1433@mtunb.ATT.COM> Date: 12 Mar 89 18:17:20 GMT Organization: AT&T Bell Labs - Lincroft, NJ Lines: 743 In the past couple of months, there have been a number of notes (and responses) on the subjects of: - Snow on the screen, and code to eliminate it. - Why the BIOS is so slow. - Fast routines to write to the screen. I've recently had the occasion to check the performance of various screen-writing routines (including the BIOS). The attached paper is (1) a tutorial on snow elimination techniques, and (2) the results of my performance measurements. Enjoy! +---------------------------------------------------------------+ | Dave Tutelman | | Physical - AT&T Bell Labs - Lincroft, NJ | | Logical - ...att!mtunb!dmt | | Audible - (201) 576 2442 | +---------------------------------------------------------------+ _S_N_O_W-_F_R_E_E _S_C_R_E_E_N _W_R_I_T_I_N_G _v_s. _V_I_D_E_O _B_I_O_S: _T_U_T_O_R_I_A_L _A_N_D _P_E_R_F_O_R_M_A_N_C_E _T_E_S_T_S Dave Tutelman 16 Tilton Drive Wayside, NJ 07712 (201) 922 - 9576 1. Principles of Snow Elimination 1.1. Theory 1.2. Practice 2. Performance 2.1. Test Measurements 2.1.1. Basic Measurements 2.1.2. Effect of Snow Elimination 2.1.3. Effect of String Writes 2.1.4. Effect of Pointer Computation 2.2. Why is BIOS so Slow? Screen Writing 2-13-89 - 2 - _1. _P_r_i_n_c_i_p_l_e_s _o_f _S_n_o_w _E_l_i_m_i_n_a_t_i_o_n If you write programs for MSDOS PCs, you face an interesting dilemma: how to write to the screen. - If you use the BIOS, you will take a performance hit; it's _s_l_o_w. - If you write directly to video RAM to speed it up, you have to write different code for each kind of video display. And some displays have an added difficulty, "snow", which is notoriously hard to eliminate. Snow is the visual noise that appears on the screen of certain displays when a program reads or writes to the video RAM. The CGA (IBM Color Graphics Adapter) is particularly snowy, but is hardly the only offender. This note discusses where snow comes from, and how to eliminate it by writing to video RAM during retrace. It also gives some detailed performance measurements that show how much speed can be gained by avoiding the BIOS calls; improvements of a factor of ten are common, and the gain can be as high as a factor of seventy. _1._1. _T_h_e_o_r_y When IBM introduced the Color Graphics display adapter (CGA), they made an unfortunate design decision. A display adapter needs to read from its memory "as needed" by the raster sweep, and write to its memory "as needed" by the CPU. To save money on the board, they didn't do this with a true dual-port memory; instead, they allowed the CPU to take precedence over the raster when they access the video RAM on the same memory cycle. During such cycles, the video generator can't read from memory, and doesn't know what video signal to put out. So it guesses, and usually wrong (hey, what's the chance of getting eight bits all right). Wrong guesses look like snow on the screen. IBM overcame this hardware deficiency in software. They wrote their video driver in the BIOS so that it writes to memory only when the video beam is turned off; At the end of each horizontal line on the screen, the adapter turns off the beam and allows it to _r_e_t_r_a_c_e back to the left edge of the screen to begin the next horizontal line. It is possible to do a "snowless" write to video memory if you do it only during the retrace. You can also use the vertical retrace (which is of much longer duration), while the beam is returned from the bottom of the screen to the top. The BIOS writes to the screen only during horizontal or vertical retrace, and so can your program. It is possible to tell when the display is retracing, because a bit in the display adapter's status register is 1 during horizontal retrace, and another bit is 1 during vertical retrace. Let's show some actual code to do such a write. Our first example will be simple (really naive); in other words, it looks good but doesn't work. We will evolve our code until we have a working Screen Writing 2-13-89 - 3 - snowless write. Suppose we want to write a word _v_i_d_w_o_r_d to video RAM. We've saved the offset in video RAM in the variable _v_i_d_o_f_f_s_e_t. (We'll use conventions of the "C" language, and Turbo C where we can't say things "portably".) We know that: - The segment part of the base address of the CGA is 0xB800. - The CGA'a status register is at input port 0x3DA. - The horizontal sweep bit is bit 0; the vertical sweep bit is bit 1. Thus the combined sweep mask on the status byte is 0x09. Thus the C code to do a snow-free write might be: /* Just spin until a retrace bit turns on. */ while (( inportb (0x3DA) & 0x09 ) == 0 ) { }; /* We're in retrace; write it. */ poke ( 0xB800, vidword, vidoffset ); Might be, but isn't. Unfortunately, probability says you'll encounter a horizontal retrace much more often than a vertical retrace, and the horizontal one lasts a very short time. The code above looks tight, but it's not nearly tight enough; by the time it actually writes to video RAM, the retrace will be over. If you ever encounter a program where the "snow" is in a vertical band a fixed distance from the left edge of the screen, you'll be observing a failed attempt at snow elimination. Instead of removing the snow, it just _s_y_n_c_h_r_o_n_i_z_e_s it to the horizontal sweep. The table below shows the sweep characteristics of a CGA display. Other displays have similar characteristics, varying from the CGA by a factor of less than two. Horizontal Vertical ----------- ------------- Sweeps per second 15,750 60 Duration of retrace 10 microsec 2158 microsec Thus the horizontal retrace is a more attractive target (we can write to the screen more frequently), but a much harder one to hit than the vertical retrace. The next section shows the programming techniques to catch the retrace as frequently as possible, in order to maximize our throughput to the screen. Before proceeding, however, it's worth mentioning another technique for snow elimination: turning off the video altogether for a short burst of screen writes. The video can be disabled by simply turning off a bit in the mode register of the display adapter. It can stay off for the time required to write about 250 characters (three lines on the screen) according to Brad Davidson (Usenet message number 1884@druxq.UUCP, August 6, 1985). However, it's not clear how often you can do this before the flicker becomes annoying. Since you have to wait for vertical retrace to start, you couldn't possibly do this more than 30 times a second Screen Writing 2-13-89 - 4 - (assuming you could get away with turning off the display every other frame); that's a maximum rate of 7500 characters per second. The use of retrace gives about the same rate, but without the program complexity of having to buffer three lines and send a burst to the screen. For this reason, I won't discuss it any further here. _1._2. _P_r_a_c_t_i_c_e I'll start off assuming that you want your code to run across a variety of DOS machines, including 4.77 MHz PCs and clones. If you're going to try for the horizontal retrace, you have to detect it and use it in under 10 microseconds (less than 50 machine cycles on a "slow" PC or XT). You will need to program it in assembler. Pascal or C won't give you the speed to catch the horizontal retrace. The reason is that you must point a register pair (say, ES:DI, so we can use a "string-move" instruction) at the target area in video RAM _b_e_f_o_r_e you start to look for H-retrace; if you find retrace and _t_h_e_n load the pointer into the register pair, you'll be late in writing. It's impossible to express this constraint in Pascal or C; you have to code it yourself in assembler. Once again, we want to write _v_i_d_w_o_r_d to location _v_i_d_o_f_f_s_e_t. Suppose we've also stored the video seqment in _v_i_d_s_e_g_m_e_n_t and the port address in _v_i_d_p_o_r_t. We've even been clever enough to put vidoffset and vidsegment in adjacent words, so we can load them into registers with a single instruction. The code we have so far is: mov dx,vidport ;Address of control register mov cx,vidword ;Data to be written les di,dword ptr [vidoffset] ;Load pointer into ES:DI mov ah,9 ;2-bit mask for V+H retrace wait_retrace: in al,dx ;Get the status register to AL test al,ah ;V+H retrace mask jz wait_retrace ;loop till either bit turns on mov ax,cx ;put the data in the accumulator stosw ;do the screen write This is a lot better than our last try. We have much less "synchronized snow" than we had before, but it's not all gone. What's wrong? Well, the code to detect horizontal retrace and use it is now shorter than the retrace itself, so we're doing some snow reduction. However, there's a strong possibility that we'll start to search for H-retrace while we're already in it, and even late in it. If we start our loop late in the H-retrace, we'll be out of it before we do the write, hence the remaining snow at the Screen Writing 2-13-89 - 5 - left edge of the screen. To eliminate all the snow, we must first make sure we're _o_u_t _o_f _t_h_e _r_e_t_r_a_c_e before we start to look for it. That way we'll be sure to find it early in the retrace. The code below does that. mov dx,vidport ;Address of control register mov cx,vidword ;Data to be written les di,dword ptr [vidoffset] ;Load pointer into ES:DI mov ah,9 ;2-bit mask for V+H retrace wait_sweep: in al,dx ;Get the status register to AL test al,1 ;H-retrace? jnz wait_sweep ;... Yes. Wait till it turns off wait_retrace: in al,dx ;Get the status register to AL test al,ah ;V+H retrace mask jz wait_retrace ;loop till either bit turns on do_it: mov ax,cx ;put the data in the accumulator stosw ;do the screen write This is almost the code that I use in my routines. However, to enhance performance, I add a couple of checks: - I keep a static variable called _s_n_o_w_o_k, which I set to 1 for display adapters that don't put snow on the screen. If this variable is set, I bypass the snow suppression code. - Before I check for the horizontal sweep, I look to see if we're in vertical retrace. The V-retrace is thousands of microseconds, and the retrace bit turns off long before the beam turns back on, so we can be write to video RAM any time we see it. - I change a pair of instructions to , which saves a couple of machine cycles. The resulting code is: Screen Writing 2-13-89 - 6 - mov dx,vidport ;Address of control register mov cx,vidword ;Data to be written les di,dword ptr [vidoffset] ;Load pointer into ES:DI mov ah,9 ;2-bit mask for V+H retrace test snowok,1 ;Do we need to do snow suppression? jz do_it ;... No. in al,dx ;Read the status register test al,8 ;Vertical retrace in progress? jnz do_it ;... Yes. Don't need to wait wait_sweep: in al,dx ;Get the status register to AL rcr al,1 ;Faster than TEST AL,1 jc wait_sweep ;Still in H-retrace. Wait wait_retrace: in al,dx ;Get the status register to AL test al,ah ;V+H retrace mask jz wait_retrace ;loop till either bit turns on do_it: mov ax,cx ;put the data in the accumulator stosw ;do the screen write Before we move on to performance measurements, however, I'd like to comment on a few differences between this and a code fragment recently posted by Ward Christensen (Usenet message 5082@phoenix.Princeton.EDU, January 2, 1989). Ward attributes the code to "FASTWRITE", by an author whose name he's forgotten. - Ward recommends bypassing the snow elimination code if the board is a monochrome board instead of a CGA; he claims that the sweep bit doesn't toggle with a mono board, and the program will hang. In my experience, bypassing snow removal for a mono board is a good idea, but _n_o_t because it hangs the program; it doesn't. However, the mono board doesn't have inherent snow; the video memory is better designed. Without the retrace checks, the program will run a lot faster. As Ward points out, the best way to decide what to do is to look at the video mode variable in the BIOS. If the mode is 7, it's a mono board. If the mode is 0 to 6, it's a CGA (or another display in CGA-compatible mode). - FASTWRITE recommends turning off interrupts while looking for retrace and writing to the screen. It (and other snow removal programs I've seen) accomplish this by surrounding the snow removal code with a CLI-STI instruction pair. I oppose this in principle, and find it unnecessary in practice. I believe that turning off interrupts should only be done if the system or application will be corrupted by the occurrence of an interrupt (e.g.- while switching stacks, or Screen Writing 2-13-89 - 7 - unloading the received byte from a UART before the next byte overwrites it). But what is the consequence of being interrupted in our routine? Either a spot of snow on the screen or a delay in the output of a character. Either is invisible if interrupts are infrequent events; neither threatens the integrity of any other operation of the computer. - FASTWRITE doesn't treat vertical retrace as a special case. This slows the "screen throughput" by 5%-10%, as measured by the techniques in the next section. _2. _P_e_r_f_o_r_m_a_n_c_e _2._1. _T_e_s_t _M_e_a_s_u_r_e_m_e_n_t_s _2._1._1. _B_a_s_i_c _M_e_a_s_u_r_e_m_e_n_t_s I made a series of measurements of "screen throughput" in characters per second, on several MSDOS Personal Computers. Throughput was measured by sending 20,000 to 100,000 characters to the screen, and timing the duration with a stopwatch. The calling program was written in C, and made repeated calls to a function _d_p_u_t_c (_y, _x, _c_h_a_r_a_c_t_e_r, _c_o_l_o_r), which was itself coded in assembler. The calling program was, roughly speaking: for (n=0; n