Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!tut.cis.ohio-state.edu!ucbvax!RICHTER.MIT.EDU!krowitz From: krowitz@RICHTER.MIT.EDU (David Krowitz) Newsgroups: comp.sys.apollo Subject: Re: Mystery Error Message-ID: <8904271524.AA03609@richter.mit.edu> Date: 27 Apr 89 15:24:54 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 94 Ok, guys, here's the scoop in full public view ... If your program trashes its stack, or if another program running in the same process (ala SR9 or SR10 with the inprocess switch set) trashes the stack, /com/tb will fail. There is no way around it. The information that tb prints out is stored on the stack, and once you've trashed the stack there ain't noth'un to print. Debugging programs which have messed up the stack is intrinsically difficult because of the nature of the error -- it erases the debugging info. The common causes of a bad stack are: 1) referencing a variable via a pointer which has not been set up correctly. Both reading *and* writing to variables via a bad pointer can result in trashing the stack (because not only can a bad pointer result in your program writing over existing info on the stack, but your program will read values that re not legitimate, and will use those values (pointers, loop indices, etc) in further computation which can go out of control). 2) mismatched argument lists in calls to external subroutines and functions. If you pass an 16-bit integer to a subroutine which thinks it is a 32-bit integer and the subroutine stores a value into that variable, then you will have trashed your stack. Subroutine and function arguments are passed on the stack. In C, they are passed by value (ie. the actual value of the variable is on the stack), so if you write a 32-bit value into a 16-bit space, you wipe out whatever followed it on the stack. Since the return address for the subroutine is also stored on the stack, you can destroy it and your subroutine will return to never-never land rather than to the calling program. With Pascal and Fortran, subroutine arguments are passed by reference (ie. the address of the variable is on the stack), so if you write a 32-bit value to a 16-bit space, you don't trash the stack immediately. BUT! Subroutines tore their local variables on the stack in addition to the arguments that were passed into them. If a subroutine calls a second subroutine passing some local variables to that 2nd subroutine, and if that second subroutine then writes a 32-bit value to one of those local variables, and if that local variable was actually a 16-bit value, THEN the 2nd subroutine will trash the stack of the FIRST subroutine! (and the first subroutine will return to never-never land as a result of an action by the second subroutine ... niffty error, eh?) 3) of course, there's always the good old standby of allocating an array which is smaller that what you actually use (or of messing up the calculation of an loop index and simply writing outside of the allocated array). If the array is a local variable in a subroutine, then it was allocated on the stack, and writing outside of the bounds of the array will overwrite the stack (and of course reading outside of the bounds will give you garbage values which can cause something else to go haywire). If the array is statically allocated (a global variable, a Fortran COMMON block, etc), you can still trash the stack if you go far enough outside the bounds of the array. How do you debug one of these monsters? First, you recognize the symptoms (can't unwind stack ...). Then you start putting print statements (or breakpoints with the debugger) scattered about your program so you can see how far the program got before it blew up. Use the -dba or -dbs switches along with the -subchk switch to look for array overflows. Note that turning debugging on/off can change the error because the addition of the debugging info to the executable program will change where the various variables are located in memory. You can cause the error to go away altogether by turning on the debugging info (because the program winds up trashing some debugging info rather than one of its own variables or the stack). Once you have the problem narrowed down to a subroutine, check everything listed above. PRINT OUT the values of ALL of your critical variables (pointer vales, loop indices, etc). Check them just prior to their use. They may be ok when you enter the routine, but get clobbered by some other code before they are used. Adding print statments to your code to track its progress and to check you variables and pointers is crude, but effective. Some of the nastier error sequences will even confuse the debugger into thinking that the error is elsewhere. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference) Maybe this should be a talk at the next ADUS conference?