Path: utzoo!attcan!uunet!tank!ncar!ames!oliveb!Ozona!chase
From: chase@Ozona.orc.olivetti.com (David Chase)
Newsgroups: comp.lang.c
Subject: Re: memory allocation
Message-ID: <29003@oliveb.olivetti.com>
Date: 14 Sep 88 01:09:43 GMT
References: <1262@micomvax.UUCP> <499@poseidon.UUCP>
Sender: news@oliveb.olivetti.com
Reply-To: chase@Ozona.UUCP (David Chase)
Distribution: comp.lang.c,comp.os.misc
Organization: Olivetti Research Center, Menlo Park, CA
Lines: 69

In article <499@poseidon.UUCP> psrc@poseidon.UUCP (Paul S. R. Chisholm) writes
(quoting a letter from Van Wyk to Bentley):
>	times).  I added the minimal bookkeeping necessary to keep my
>	own queue of free nodes of this particular kind, and reduced
>	the memory allocator's share of the time to about thirty
>	percent.
>The key point here is that Van Wyk measured his program (with a
>non-trivial test input), found where the program spent its time, and
>*then* optimized it.

This is one good approach (I've used it several times); another (if
you simply must write your own memory allocator) is to use somebody
else's benchmarks to ensure that you at least implement a good
algorithm.  Before writing yet another allocator, I suggest that (as a
minimum) you read "Dynamic Memory Allocation in Computer Simulation"
by Norman R. Nielsen, in CACM 20:11 (November 1977).  I don't know if
anyone has done a similar study of allocation techniques that have
been invented since then, but I'd like to hear about it if they have.

Here, I'll even save you the trouble.  By most measures, the best
algorithm maintains a free list for each node size that is requested
(NO ROUNDING UPWARD EXCEPT FOR WORD ALIGNMENT).  This algorithm was
typically fastest (though buddy blocks has better worst-case
performance) and nearly as effective in limited memory situations
("limited memory" = "sum of user requests is 80% of what is available,
ignoring all overhead imposed by allocator").  On the "base demand
list", this algorithm required only 85.6% of available memory to
satisfy a request for 80%; the best required 83.6%, but was 10 ten
times slower.

On a more rigorous set of 18 test loads, this algorithm was best at
50% (succeeded 18 times, max of 42 IBM 360/65 microseconds for
alloc-dealloc), best at 66.7 % (succeeded 18 times, 42 usecs) and
probably best at 80% (succeeded 15 times, 169 uSecs).  I say probably
best because only selected algorithms were given the full treatment,
and some other might have been more successful.  However, of the
algorithms given the full treatment, the most successful faster
(worst-case) algorithm only succeeded 7 times.

I should add that I've implemented a couple along these lines, and
played with a few that other people have implemented, and I am
happiest with the approach taken in the Boehm-Demers garbage collector
(available from the Rice archive server*, I think).  It uses multiple
free lists for blocks below a certain size (4K - overhead), and a
single free list for blocks larger than that.  It can easily be
convinced not to try to collect garbage, but unless you take delight
in writing truly squirrelly code, it will correctly take out the
trash.  For more information about this collector, see

  Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative
  Environment", Software Practice & Experience, to appear, September
  1988.

Weiser experimented with a special version of the collector for
tracing storage leaks; typically, the collector does a better job than
programmers.

I realize that electronic mail is at your fingertips, but I know that
there are university libraries containing the cited material in
Silicon Valley, Austin, Houston, and Boston, and probably other places
besides those.  I can't really recommend the UT or UH campuses, but
walks through Stanford, Rice, MIT, or Harvard are quite pleasant.  Get
off your computer butts and go read some of this stuff.

* mail -s "help" "archive-server@rice.edu" < /dev/null
  mail -s "send index sun-source" "archive-server@rice.edu" < /dev/null

David Chase
Olivetti Research Center, Menlo Park, CA