Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!usc!apple!mips!sgi!shinobu!odin!mds!mds From: mds@mds.sgi.com (Mark D. Stadler) Newsgroups: comp.sys.sgi Subject: Re: Bus error DURING call to malloc() Keywords: help bus error malloc Message-ID: <8863@odin.corp.sgi.com> Date: 12 Jun 90 18:26:11 GMT References: <14525@thorin.cs.unc.edu> <62083@sgi.sgi.com> Sender: news@odin.corp.sgi.com Organization: Silicon Graphics, Inc., Mountain View, CA Lines: 60 In article <62083@sgi.sgi.com> yohn@tumult.asd.sgi.com (Mike Thompson) writes: >In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes: >> >> We are running OS 3.2.2 on an IRIS 4D/240GTX. I ran a program and >> got the proverbial 'Bus error (core dumped)' message. The catch is that >> when I run dbx and look for the error, it tells me that the error occured >> IN malloc(): >> ... >> There are several calls to malloc() in the code. There have been >> successful calls before this call is made. All calls are passed constant >> references, and this code compiles and runs correctly on a variety of other >> machines (VAX, sun 4, DecStation). >> ... >> Is there a known bug (and hopefully fix) for this? > >I cannot guarantee that there are no bugs in malloc (I assume you are >getting malloc from libc), but I don't know of any (besides performance >problems when allocating many memory areas). But I have seen many, >many user programs that bomb in malloc because the user code overran >the memory allocated by a call to malloc. malloc(strlen(s)) and >copying s is a classic way to get into trouble (user forgets that >strlen does not account for the trailing null character) -- there are >many other possibilities. > >Since malloc(3X) -- the malloc in /usr/lib/libmalloc.a -- aligns >requests to eight-byte boundaries and malloc(3C) aligns only to >four-bytes, switching to libmalloc may help if only that it masks gives >the caller a little more unrequested rounding space. > i've examined a number of malloc() problems throughout the last 7 years or so, and have always traced the problem back to the application... there are a couple of good reasons that malloc() usage problems are masked on a machine and libmalloc basis. first of all, i know that a number of VMS programs have malloc problems once they are ported to unix. the VMS malloc rounds the request up to the nearest multiple of 512 (page size). then it skips the next virtual page. this turns out to be a great debug tool since you get core dumps when you hit the next page instead of quietly corrupting some other data structure. unfortunately, the granularity is only at the page level, so small problems are masked and only surface in other environments. VAX unix may act similar, but i don't know for sure. the traditional libc malloc approach uses a linked list scheme where the next pointers are embedded in the memory arena. if you overwrite a chunk of malloced memory, you corrupt the linked list and the next call to malloc() will traverse into the boonies. the libmalloc approach keeps the pointers into the memory arena in a separate area and therefore, if you overwrite a chunk of malloced memory, you may corrupt some other data structure that doesn't really matter anyway... (at least not at the time). since the next pointers are saved from corruption, malloc() won't dump core. but you still have a problem lurking out there somewhere. i think i'd stick to the old malloc() and narrow the problem down more. if you mask this symptom, you will make it even more difficult to isolate a problem further down the road. -- mds [aka Mark D Stadler mds@sgi.com ...!uunet!sgi!mds (415)335-1327]