Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!whuxl!whuxlm!akgua!gatech!seismo!harvard!think!mit-eddie!cybvax0!vcvax1!tom From: tom@vcvax1.UUCP Newsgroups: net.micro.pc Subject: Suspected "popf" bug in Intel 80286 (long) Message-ID: <179@vcvax1.UUCP> Date: Fri, 18-Apr-86 11:44:12 EST Article-I.D.: vcvax1.179 Posted: Fri Apr 18 11:44:12 1986 Date-Received: Mon, 21-Apr-86 08:16:33 EST Distribution: net Organization: VenturCom Inc., Cambridge, MA Lines: 142 [ While experimenting with an asynchronous communication driver for VENIX (in protected mode) on the IBM PC/AT, I encountered some rather strange behavior that I now attribute to a bug in the Intel 80286 processor. In brief, I suspect that the "popf" instruction enables interrupts under certain circumstances even though the IF flag is 0 before the instruction is executed and set to 0 by the "popf" instruction itself. Because of the "popf" is so often used successfully, I would like to hear from others about whether they have encountered the same or a similar problem. If so, I would like to know how they programmed around it. My concern is that I'm either mistaken and there is no hardware bug or that I'm correct and the fix that I found is not applicable to all processor lots. The Evidence The problem expresses itself in the asynchronous driver by a loss of characters on OUTPUT. Only single characters are lost every now and then. The problem can express itself when kernel printf's indicate that no interrupts other than transmit interrupts are occurring. I have tried 2 different AT's with the same results. The reason for the character loss turns out to be due to a completion interrupt occurring while the tty startup routine is running. The startup routine disables interrupts and stuffs a character into the transmit buffer. But for some reason, the 80286 allows the COM port to interrupt causing the transmit interrupt routine to overwrite the character just put in the buffer. Kernel printf's triggered by sanity checks in the driver indicate the the instruction being interrupted is a "popf" and that the IF flag before and after the "popf" is 0. A number of obvious checks were done to rule out programmer error. For example, after a kernel was demonstrated to show the bug, I dumped the code segment of the running kernel and compared it to the object file. No difference. The Fix Since the source of the interrupt was always a particular "popf" (in the splx() routine), I concentrated on recoding the kernel where the "popf" occurs. To convince myself that the symptoms were not due to a bug in the loader, I recoded the kernel using "adb" on the object file and then booted the modified object file. Therefore, changes in behavior are directly correlated with code changes (and not with changes in linking the kernel, compilation, size of kernel, etc.). The following code sequences for the routine splx() were tried and FAILED. The first is the original: 1) pop cx | return address popf | new flags pushf | dummy arg jmp cx | return 2) pop cx nop, nop, nop, nop | Padding for timing. popf pushf jmp cx 3) pop cx popf push cx | Could "pushf" be messing up "popf" ? jmp cx 4) pop cx popf push cx push cx | Could "jmp cx" be messing up "popf"? ret 5) pop cx xor ax,ax | Dummy register access. popf push cx jmp cx 6) pop cx mov ax,#0 | Dummy memory access. popf pushf jmp cx Perhaps the reasons for testing the above code will be clear by the following coding of splx() that FIXES the problem: 7) pop cx pop ax push ax test ax,#0x200 | Don't use popf! bne Lsplx cli jmp cx Lsplx: sti jmp cx 8) pop cx pop ax and ax,#0x7FD5 | Mask off "don't care" bits. push ax popf pushf jmp cx 9) pop cx pop ax and ax,#0xFFFF | Does masking really work? push ax popf pushf jmp cx 10) pop cx pop ax | Is it the "and" that does it? push ax popf pushf jmp cx 11) mov bx,sp push *2(bx) popf ret Remember that the kernel was patched with each of the above codings of splx(). Those that worked, worked for as long as I watched them (several minutes at 9600 baud). Those that failed, failed every few seconds or so. The interrupted instruction was always the "popf". Tom Scott VenturCom, Inc. ..!seismo!harvard!cybvax0!vcvax1!tom