Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!bloom-beacon!eru!luth!sunic!mcsun!ukc!icdoc!syma!aarons
From: aarons@syma.sussex.ac.uk (Aaron Sloman)
Newsgroups: comp.object
Subject: Re: [generational] Garbage collection -- difficulty, cost, generality
Keywords: garbage collection times, generational gc
Message-ID: <2372@syma.sussex.ac.uk>
Date: 20 Mar 90 01:56:20 GMT
References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <1990Mar15.231331.29545@Neon.Stanford.EDU>
Organization: School of Cognitive & Computing Sciences, Sussex Univ. UK
Lines: 123

wilson@carcoar.Stanford.EDU (Paul Wilson) writes:

> Date: 15 Mar 90 23:13:31 GMT
>
> In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman)
> writes:
> > I (Paul Wilson) write:
> >
> >> ...................Don't expect a fast language on stock
> >> hardware to have gc speed costs of less than about 10% on average.
> >> I think in the 8-15 percent range myself.
> >    ..........

> >(Sloman:)
> >I thought 8-15 percept was rather high, so I thought I would do some
> >experiments in Pop-11 (a lisp like language with a Pascal-like
> >syntax) in the Poplog system on a Sun4/260 (stock hardware I think).
> >
> > [ ...description of programs run, short resulting gc times...]
>
> (Wilson:)
> Two points:  I probably didn't make it clear in my posting that I was
> referring to *generational* gc's.

I did not realize that was your intention. Poplog does not at present
have a generational gc. It has a stop-and-copy gc for use when there's
lots of memory/swap space available and a shuffling Morris gc for use
when there isn't, or paging is too slow. It also allows the heap to be
"locked": e.g. if you have compiled a very large suite of programs, lock
the heap after compilation, and then only a small amount of store will
need to be copied on each GC, though all still has to be scanned. (This,
I believe, produces the advantages of a generational GC, except that it
isn't automated.)

Poplog's heap can have "holes" in it for items allocated by external
programs linked in (e.g. C, Fortran). It can also store non-relocatable
Poplog data in such holes, e.g. for use in connection with external
programs.

> ........   Also, I'm unclear about the
> system you're running -- I don't know much about Pop.
>
> The programs you're testing appear not to stress a gc much.

It's hard to stress the Poplog GC as it is very fast. So only
frequent GC's cause problems, and, as I said, you can do things
to reduce frequency, like expanding heap space.

> ... This is
> likely because they're short and they don't ever have much live
> data.

One reason I chose compilation for part of the test is that the amount
of "live" data steadily increases, at least with Poplog's incremental
compiler, which allocates procedures in heap space, along with
everything else. During compilation of the Prolog subsystem of Poplog,
GC time was about 3 per cent. When compiling the Common Lisp extension,
which increases the used heap area by a megabyte, the GC time was about
3.7 percent. Both percentages can be decreased by allocating more heap
space, and/or by locking the heap after each file is compiled.

> Running a small program several times is NOT at all like running a
> big program once.  Such experiments can be *very* misleadingly optimistic.

I agree, but part of my point was also that for a given program you can
vary the percentage of GC time by setting a small or large limit for the
heap size. By allocating a VERY large heap (system memory and swap space
permitting) I can reduce the frequency of garbage collections and
thereby substantially reduce the percentage of time spent on gc.

So statistics are more or less useless without full information about
such variables.

> Running programs that do a lot of number consing is particularly bad,
> because those objects are typically short-lived and will flatter
> your gc efficiency.  (Unless you store pointers to the heap-allocated
> numbers into objects in an older generation.  Then your performance
> can get bad fast.)
Alternatively you have very little spare heap space and therefore
have to reclaim space FREQUENTLY, in which case the performance will
also get bad.

> ....  In real long-running programs, the amount of live data
> *is* typically larger, sometimes dramatically larger, and the copying
> cost goes up significantly.

I guess what's real, or typical, is a matter of style, which is why
benchmarking is so difficult. I do a lot of real work in the poplog
editor - reading and sending mail, editing, developing programs,
examining online documentation, browsing through news bulletins, running
a Pop-11  based troff previewer, etc. etc. and many of the files I
regularly read into the editor for browsing and editing are a few
hundred kilobytes or more in size. The editor uses a vector of strings
for a file, and while you work on a line, the corresponding string will
be discarded every time it needs to grow, and every time you move off a
string with wasted space to edit another line. So garbage is being
created all the time. Yet with a big heap you can work for ages without
invoking a GC and if you don't actually trace GC's you won't notice them
because they are so fast (except on an overloaded shared machine).

> (But maybe Pop-11 doesn't have a generational gc.  If not, I guess
> it doesn't have a lot of heap-resident system stuff like big editors,
> browsers and compilers;  that would increase the pause times.)

No, it doesn't have a generational gc, and most of the core system
stuff, like the editor, compilers, utilities, etc are not in the
heap. However, there are large libraries of utilities, which get
added to the heap if used. (On VMS, SunOS, and Dynix, saved images
can be created in which user code and date are in a shared, non
garbage collectable, segment.)

Anyhow, it appears, from what you say, that generational GCs have a
serious performance penalty, and that it may be preferable to have a
very fast, infrequently invoked, alternative mechanism.

Apologies for letting this grow too long. It's all that heap size
available....

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons@cogs.sussex.ac.uk
            aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk