Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!pasteur!agate!shelby!neon!carcoar!wilson
From: wilson@carcoar.Stanford.EDU (Paul Wilson)
Newsgroups: comp.object
Subject: Re: [generational] Garbage collection -- difficulty, cost, generality
Keywords: garbage collection times, generational gc
Message-ID: <1990Mar21.005340.3044@Neon.Stanford.EDU>
Date: 21 Mar 90 00:53:40 GMT
References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <1990Mar15.231331.29545@Neon.Stanford.EDU> <2372@syma.sussex.ac.uk>
Sender: news@Neon.Stanford.EDU (USENET News System)
Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson)
Organization: U. of Illinois at Chicago (UIC, *not* UofC or UIUC)
Lines: 198

In article <2372@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) writes:
>wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>
>> Date: 15 Mar 90 23:13:31 GMT
>>
>> In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman)
>> writes:
>> > I (Paul Wilson) write:
>> >
>> >> ...................Don't expect a fast language on stock
>> >> hardware to have gc speed costs of less than about 10% on average.
>> >> I think in the 8-15 percent range myself.
>> >    ..........
>
>> >(Sloman:)
>> >I thought 8-15 percept was rather high, so I thought I would do some
>> >experiments in Pop-11 (a lisp like language with a Pascal-like
>> >syntax) in the Poplog system on a Sun4/260 (stock hardware I think).
>> >
>> > [ ...description of programs run, short resulting gc times...]
>>
>> (Wilson:)
>> Two points:  I probably didn't make it clear in my posting that I was
>> referring to *generational* gc's.
>
>I did not realize that was your intention. Poplog does not at present
>have a generational gc. It has a stop-and-copy gc for use when there's
>lots of memory/swap space available and a shuffling Morris gc for use
>when there isn't, or paging is too slow. It also allows the heap to be
>"locked": e.g. if you have compiled a very large suite of programs, lock
>the heap after compilation, and then only a small amount of store will
>need to be copied on each GC, though all still has to be scanned. (This,
>I believe, produces the advantages of a generational GC, except that it
>isn't automated.)

This is a reasonable intermediate strategy, between a simple gc
and a generational one.  Since scanning is several times faster
than copying (as long as no paging is involved), it can be a win.
It avoids the generation checking and/or marking of a fully generational
gc, but still gives you part of the benefit.  For a large system
with large heap images (either system goop or application data),
though, it will not do well.  Scanning a many-megabyte image can
thrash the system at every gc.


>> ........   Also, I'm unclear about the
>> system you're running -- I don't know much about Pop.
>>
>> The programs you're testing appear not to stress a gc much.
>
>It's hard to stress the Poplog GC as it is very fast. So only
>frequent GC's cause problems, and, as I said, you can do things
>to reduce frequency, like expanding heap space.

But memory is not free.  In a high-performance native-code-compiled
system, your performance will generally drop off quickly at the
point where you're allocating and scavenging more virtual memory
than you can keep in physical memory.  This is becoming more severe
all the time, as processor speed gains continue to outstrip disk
speed gains.  Decreasing memory costs tend to counter this
trend, but it's unclear what the bottom line is.  (And the
various trends have different impacts on different users.)

>> ... This is
>> likely because they're short and they don't ever have much live
>> data.
>
>One reason I chose compilation for part of the test is that the amount
>of "live" data steadily increases, at least with Poplog's incremental
>compiler, which allocates procedures in heap space, along with
>everything else. During compilation of the Prolog subsystem of Poplog,
>GC time was about 3 per cent. When compiling the Common Lisp extension,
>which increases the used heap area by a megabyte, the GC time was about
>3.7 percent. Both percentages can be decreased by allocating more heap
>space, and/or by locking the heap after each file is compiled.

>> Running a small program several times is NOT at all like running a
>> big program once.  Such experiments can be *very* misleadingly optimistic.
>
>I agree, but part of my point was also that for a given program you can
>vary the percentage of GC time by setting a small or large limit for the
>heap size. By allocating a VERY large heap (system memory and swap space
>permitting) I can reduce the frequency of garbage collections and
>thereby substantially reduce the percentage of time spent on gc.

Until you outrun physical memory and start paging.  You have to trade
off increased copying for increased locality.  The interaction
between heap sizes and gc efficiency is important, but can't
be viewed in isolation.  And a simple compiler may not be a good
test for two reasons.  As I said previously, the average amount of
live intermediate data may be small, when compared to an optimizing
compiler.  But in your case, it would appear that the dominant
cost would be the repeated copying of the "results" (the already-
created procedure objects, etc.) while creating more results.  This
could get much worse, too.  An optimizing compiler may do several
times as much computation, allocating lots of space and forcing
several times as many gc's.  So the dominant cost may be increased
by several times.

>So statistics are more or less useless without full information about
>such variables.

Right.  You need to know when to bill the gc for whatever page faults
occur, rather than just measuring pause times.  There's no simple
answer, because it depends on how much memory you have, and how much
of that you had to buy precisely because of the characteristics of
the system you're measuring.

>> Running programs that do a lot of number consing is particularly bad,
>> because those objects are typically short-lived and will flatter
>> your gc efficiency.  (Unless you store pointers to the heap-allocated
>> numbers into objects in an older generation.  Then your performance
>> can get bad fast.)
>Alternatively you have very little spare heap space and therefore
>have to reclaim space FREQUENTLY, in which case the performance will
>also get bad.

Right.  A generational gc will tend to compensate somewhat for short-
lived numbers, as long as you don't store pointers to them into old
objects.  But a non-generational gc will simply copy (or scan) all
the data that much more often.  (Though serious number consing may
make the program enough slower than a well-compiled program that
it will flatter the gc significantly more. Less number consing
tends to go hand-in-hand with faster code, increasing the relative
importance of the *rest* of the gc's costs... yikes!)

>> ....  In real long-running programs, the amount of live data
>> *is* typically larger, sometimes dramatically larger, and the copying
>> cost goes up significantly.

>I guess what's real, or typical, is a matter of style, which is why
>benchmarking is so difficult. I do a lot of real work in the poplog
>editor - reading and sending mail, editing, developing programs,
>examining online documentation, browsing through news bulletins, running
>a Pop-11  based troff previewer, etc. etc. and many of the files I
>regularly read into the editor for browsing and editing are a few
>hundred kilobytes or more in size. The editor uses a vector of strings
>for a file, and while you work on a line, the corresponding string will
>be discarded every time it needs to grow, and every time you move off a
>string with wasted space to edit another line. So garbage is being
>created all the time. Yet with a big heap you can work for ages without
>invoking a GC and if you don't actually trace GC's you won't notice them
>because they are so fast (except on an overloaded shared machine).

A lot of the (mostly interactive) things you mention *do* account for
a lot of peoples' primary usage patterns.  But for most of these
things, raw general computation speed is likely not to be the limiting 
factor either -- the continual overhead of a generational gc probably 
won't be a big deal.  You're more likely to be concerned with the speed of
graphics & disk operations, etc.  And it will allow you a more 
integrated system, with utilities implemented in the language and 
resident in the heap.

>> (But maybe Pop-11 doesn't have a generational gc.  If not, I guess
>> it doesn't have a lot of heap-resident system stuff like big editors,
>> browsers and compilers;  that would increase the pause times.)
>
>No, it doesn't have a generational gc, and most of the core system
>stuff, like the editor, compilers, utilities, etc are not in the
>heap. However, there are large libraries of utilities, which get
>added to the heap if used. (On VMS, SunOS, and Dynix, saved images
>can be created in which user code and date are in a shared, non
>garbage collectable, segment.)
>
>Anyhow, it appears, from what you say, that generational GCs have a
>serious performance penalty, and that it may be preferable to have a
>very fast, infrequently invoked, alternative mechanism.

Right -- if you're willing to live with a less congenial development
environment in creating the system.  (You give up writing almost
everything in the same language, with all of your debugging tools
available.)  And for a lot of people, it's still not going to work
well.  If your applications have megabytes of live data, or if
they include large, heap-resident libraries,  you're going
to run into annoying gc pauses and overhead, or thrash.  My own
preference is for a generational gc, at least in a general
development environment;  its performance may have a higher
baseline overhead, but it's more robust in the face of big hairy
programs.  You trade a little processor overhead for a lot of
locality, and save significant money on memory.

For a system that generates stand-alone programs, though, you'd
like to have the option of specifying which gc strategy it should
use, and have three choices.  A simple gc will give the fastest
performance for programs with little data live at any given
time;  a generational gc will be better for programs with
lots of live data, and a fine-grained incremental gc will
be necessary for difficult real-time applications.  (The
performance will be degraded significantly, but response
times will be reliably predictable.)


    -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680