Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!haven!adm!news From: pauld@scenic.wa.com (Paul Davis) Newsgroups: comp.unix.wizards Subject: Unix V/386 driver help (LONG) Message-ID: <24100@adm.BRL.MIL> Date: 9 Aug 90 18:39:20 GMT Sender: news@adm.BRL.MIL Lines: 204 This message has been cross-posted to the following groups: 386users info-ibmpc Sun-386i Unix-Wizards IMPORTANT CAVEAT ******************* This message concerns a difficult problem concerned with a Unix V/386 device driver for a RISC coprocessor based on AMD's 29000 chip. If you do not have low level 80386, AT Bus and/or Unix V/386 experience, you should probably skip the message. BACKGROUND ********** We have written a Unix V/386 device driver for this coprocessor. It provides between 16 and 25 MIPS, and 4 M-Whetstones of computing horsepower. AMD specify a "High Level Interface" (HIF) for people developing for boards based upon the 29000. The HIF specifies a set of available system traps that emulate several Unix system and library calls. Some of them can be satisfied by a kernel that runs on the 29000, but others require the services of the host 386 operating system. In practice this gives the programmer access to regular Unix-like I/O to and from the host filesystem, as well as a number of system calls such as time(), and library calls such as getenv(). The net effect is that development for the board can be carried out under Unix (or DOS) and then once the code runs, simply recompiled for the 29000. The device driver, however, has to support these HIF requests, particularly those concerned with file I/O. It is obviously not entirely normal for a driver to do file I/O ..... For commercial reasons, I can't discuss here exactly what solution was reached to solve this problem, but its not relevant, since the problems discussed below exist even in a miniature version that does not support file I/O. The board is targeted as a RIP for PostScript work, and also a high speed general coprocessor. We are using it for both purposes, both to drive an 800x400 dpi laser engine, and to carry out some pretty intensive and proprietary image processing (halftoning, grey scaling and compression). The driver is operational, at least to the extent that I can run Whetstone tests, our image processing code, and a clone of PostScript (with some provisos), and a variety of other programs. You should be aware that the company that designed this board did not design it with the AT bus spec in mind, in as much as they do not pulse the interrupt line. Instead, the line stays high until the interrupt handler reads from a register on the board. We have hassled them A LOT about this, but under DOS, it seems to cause no problems. Using a kernel debugger, I have verified that our Unix V/386 system (Interactive's 386/ix) is indeed initializing the PIC in edge-triggered mode, and there is other confirmatory evidence of this too. THE BEAST ********* When a program running on the 29000 makes a call that requires the services of the host (such as time()), the board generates an interrupt. The interrupt handler reads both a register and a memory location on the board to determine the nature of the request. It services it, writes some status values to memory on the coprocessor and everyone is happy. The interrupts are actually generated by a kernel that runs on the 29000 and which in itself has to generate a few extra interrupts at startup and termination (see below). Consider the following program: main (argc, argv) int argc; char *argv[]; { int intcnt; intcnt = atoi (argv[1]); while (intcnt--) time (); exit (0); } The program, with an argument of 1, generates 8 interrupts. These satisfy the following function calls either by the on-board kernel, or by the above program. REQUEST FUNCTION SOURCE REASON 20 WRITE 29000 kernel (startup message) 20 WRITE 29000 kernel (startup message part II) 66 COPYARGS 29000 kernel (get arguments for program) 49 TIME program (get time) 18 CLOSE program (close stdin) 18 CLOSE program (close stderr) 18 CLOSE program (close stdout) 1 EXIT program (exit) You will hopefully see that this program generates (n+7) interrupts, where n is argv[1]. It calls time() n times, each of which generates an interrupt, and also has the overhead of the startup and shutdown interrupts regardless of the value of argv[1]. All is fine. HOWEVER, if the argument is increased, then wierd things begin to happen. When I say wierd, I mean that the machine reboots. Just like that. Reboots ...... What is the value of argv[1] when this happens ? Good question. It varies from between 200 to 1000. It does not seem to be deterministic. However, if I jump into the kernel debugger at some point whilst this program is running (which is pretty difficult with a 25 MIP board :-)), then at certain times, I see a LARGE (30-60) number of traps built up when I issue "stack" command. The traps are of the same type as the interrupt vector used by the board - i.e if the board uses IRQ 10, there will lots of "trap A"'s. What seems doubly wierd is that if I look at the stack I notice: i) only the trap on the top of the stack calls cmnint(), which from my disassembler adventures with the kernel debugger is what actually calls the interrupt handler defined in ivect (declared in config.c, built and compiled at kernel build time). ii) the rest call some other function. iii) what this function is seems to depend upon the IRQ used by the board. If I use IRQ 10, then the extra Traps on the stack will calls timein(), whose function appears to be checking the timeout() stack (the array "callout") for functions that should be called after a given period of time. However, if the board uses IRQ 15, then the extra's call clock_int(), although a call to timein is also on the stack, apparently from the call to clock_int() generated by the prior trap. SOME QUESTIONS: *************** 1) What causes a given trap, of the a given type, to call a particular kernel function instead of another ? Is this vectored by some low level 386 hardware stuff, or is there a layer of the Unix kernel that routes this ? 2) Does the prescence of 30-60 traps built up on the stack indicate a real problem ? Can the debugger be trusted when it reports this information ? 3) Could an overflow of the stack reboot the machine ? If I single step through the kernel when the stack builds up like this, I can reboot the machine on a single machine instruction .... I don't know what that instruction is, however ... 4) Could a board that generated "level-triggered" interrupts cause multiple traps in this way ? When I say level-triggered, I mean that the board does NOT generate a pulse, but instead drives its interrupt line high until something reads from one of its registers ? We (and the board's makers) tried testing this under DOS, and could not generate multiple interrupts, but thats not necessarily a fair test. 5) What the hell is going on ? I appreciate that there is a lot here to digest and a lot of areas for problems to arise. I have had some Unix driver experience before, but simply do not have the knowledge or the access to the kernel source to know what is happening at this level. There seem to be extremely few people around who have the kind of experience and knowledge to deal with this type of question - DOS folk don't know anything about the way Unix handles interrupts, whilst Unix folks tend not to have a very deep knowledge about interactions between hardware and the kernel. Basically, we are stuck with this, and any assistance or help you can offer will be much appreciated. I do not subscribe to any of the lists to which this has been submitted, so please reach me by mail at: pauld@scenic.wa.com Thanks for your time and expertise. -- paul Paul Barton-Davis ScenicSoft, Inc. (206) 776-7760 "Industry without art is brutality"