Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!ncar!ico!rcd From: rcd@ico.isc.com (Dick Dunn) Newsgroups: comp.unix.aix Subject: Re: malloc (was: making a request to IBM) Summary: problem is where/how the failure is reported Keywords: malloc psalloc paging space Message-ID: <1991Apr18.195549.24077@ico.isc.com> Date: 18 Apr 91 19:55:49 GMT References: <1991Apr9.024814.1141@appmag.com> <6644@awdprime.UUCP> Organization: Interactive Systems Corporation, Boulder, CO Lines: 78 mbrown@testsys.austin.ibm.com (Mark Brown) writes: [lost the previous attribution for problem statement] > | The problem: as you all remember, malloc() returns NULL only > | when the process exceeds its datasize limit. If malloc returns a > | non-null pointer, the memory may turn out to be exceedingly > | virtual... ... > | Personally, I think it's a bug. If there is no memory left, > | malloc should return a NULL. IBM says it's a feature, catch > | SIGDANGER if you don't like it. The way I read this, the complaint is from the normal-programmer point of view: There's a defined way to indicate that there's no more memory available--return NULL from malloc(). SIGDANGER is an IBM invention. > Yeah, I've heard complaints (and roses) on this one. > The Rationale: Rather than panic the machine, we'd like for it to keep > running as long as possible. Hence, we try to keep running at all costs, > including doing things like this. So, when we do get close to the limit, > we send a warning, than as we go over we start killing the biggest memory > users. (Warning - this processes involved have been overly simplified). As various folks have pointed out, various UNIX systems have had more-or- less graceless responses to running out of (memory+swap). One might ask therefore that a new behavior be better, instead of just different. The "mistake" (if I may call it that) in what Mark is saying, is that the overcommitment of memory/pagespace is a kernel problem. The kernel created the problem by overallocating, so the kernel (being that piece of code responsible for allocating/managing the hardware!) should solve it rather than handing it back to the applications. Look at the problem from the application point of view. > The Idea was to make the machine 'more reliable'... I'll object to the idea that killing some arbitrary process makes the machine "more reliable". If you want "more reliable", don't overcommit! >...Our research led us > to believe that many processes allocated more memory than actually used in > page space (I think) and we used this knowledge... There's something wrong with this. What type of programs were studied in this "research"? I know that typical style in C is: p = (struct whatzit *)malloc(sizeof(struct whatzit)); ... p->thing1 = stuff1; p->thing2 = stuff2; where "..." is rarely more than a check for NULL. The trouble with SIGDANGER is that it occurs at a time which makes no sense to the programmer. Just because you happened to touch some particular piece of memory (and it's unlikely you really know where your page boundaries are) for the first time...or worse yet, some *other* process touched memory for the first time!...you get SIGDANGERed up 'side the head? What do you do? How did you get there? It's fiendishly difficult to tie it back to a real event in terms of what the program knows. Add to that two other considerations: - SIGDANGER is not portable. While IBM may not mind having people write IBM-specific code, many programmers find that requirement objectionable (especially since it's hard to use; it's an anti- feature). - There's a defined way to report insufficient memory to a program (NULL from malloc()), and it happens in a way/place a programmer can use. ...and you can see why a programmer would get upset. > So, do we go back to blowing up processes that allocate too much memory, > even though that memory may actually be there by the time the process > actually uses it?... In the case of C programs and malloc(), yes. If you can't allocate usable memory (meaning "usable" at the point of return from malloc()), you should return NULL. That doesn't "blow up" the process; it gives it a fair chance to decide what to do. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...While you were reading this, Motif grew by another kilobyte.