Xref: utzoo comp.unix.questions:19804 comp.unix.wizards:20540 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!wuarchive!mit-eddie!aryeh From: aryeh@eddie.mit.edu (Aryeh M. Weiss) Newsgroups: comp.unix.questions,comp.unix.wizards Subject: Re: Help-Bus Errors Keywords: what causes them Message-ID: <1990Feb10.192028.16025@eddie.mit.edu> Date: 10 Feb 90 19:20:28 GMT References: <1810@lzga.ATT.COM> Reply-To: aryeh@eddie.MIT.EDU (Aryeh M. Weiss) Organization: MIT EE/CS Computer Facilities, Cambridge, MA Lines: 46 In article <1810@lzga.ATT.COM> bogatko@lzga.ATT.COM (George Bogatko) writes: >Help. We have a program that we do not have source for that is dumping >core with Bus Error. Does anybody have, or can point me to, a list of >what causes the major core dumps, I.E. Bus, EMT, Memory Fault, etc. A program dumps core when it receives a signal that is not currently being caught or ignored and causes a core dump. (Signals that cause core dumps are (SIG) QUIT, ILL, TRAP, IOT, EMT, FPE, BUS, SEGV, and SYS.) Any of these signals can be sent to a process via kill(2S). SIGQUIT is usually caused by the quit key (^\). SIGILL by execution of an illegal instruction (this may be indicative of a trashed stack causing a procedure to return to a random location in the code). SIGTRAP, IOT, and EMT are caused by executing special processor machine instructions. The names are throwbacks to the pdp-11 days and are named after instructions in the pdp-11 instruction set. These are obviously machine dependent, but seem to have equivalents on various popular hardware platforms (Vaxes, 68000, 80x86). Trap instructions are used by debuggers to set breakpoints in the code of a process being traced, but I don't know how they interact with SIGTRAP when being used for this purpose. SIGFPE are caused by floating point errors, such as divide by 0, overflow, and (on Intel x86/x87 system) FPU stack overflows (Xenix 386 users may be familiar with this last one). Now the tricky ones: SIGSEGV is caused when a process addresses a location outside of its (code or data) address space. This is typically caused by overrunning an array, incrementing (and dereferencing) a pointer beyond the end of process memory, and, most familiar to all programmers of non-Vax Unix systems, dereferencing the dreaded NULL pointer. SIGBUS errors are quite machine dependent, but in my experience can be caused by two circumstances: (1) reference to an impossible machine address (this would occur on 68000 systems if you went beyond address 2^24 and may occur on 386/286 systems if you load a segment register with an absurd segment number) and (2) reference an odd address with a word oriented instruction (this is a no-no on Vaxes and 68000's, but 80x86 systems don't mind). SIGSYS is for bad arguments to a system call, but this has never happened to me and I do not know how bad the argument has to be. Illegal addresses passed to system calls generally get returned to the calling process with an error code, so I don't know how exactly to get one of those (this may be another throwback to the olden days of yore). >Please, no flames. This question certainly comes under the heading of things your mother (and the manuals) never told you. --