Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!bloom-beacon!eru!luth!sunic!mcsun!ukc!icdoc!syma!aarons From: aarons@syma.sussex.ac.uk (Aaron Sloman) Newsgroups: comp.object Subject: Re: [generational] Garbage collection -- difficulty, cost, generality Keywords: garbage collection times, generational gc Message-ID: <2372@syma.sussex.ac.uk> Date: 20 Mar 90 01:56:20 GMT References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <1990Mar15.231331.29545@Neon.Stanford.EDU> Organization: School of Cognitive & Computing Sciences, Sussex Univ. UK Lines: 123 wilson@carcoar.Stanford.EDU (Paul Wilson) writes: > Date: 15 Mar 90 23:13:31 GMT > > In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) > writes: > > I (Paul Wilson) write: > > > >> ...................Don't expect a fast language on stock > >> hardware to have gc speed costs of less than about 10% on average. > >> I think in the 8-15 percent range myself. > > .......... > >(Sloman:) > >I thought 8-15 percept was rather high, so I thought I would do some > >experiments in Pop-11 (a lisp like language with a Pascal-like > >syntax) in the Poplog system on a Sun4/260 (stock hardware I think). > > > > [ ...description of programs run, short resulting gc times...] > > (Wilson:) > Two points: I probably didn't make it clear in my posting that I was > referring to *generational* gc's. I did not realize that was your intention. Poplog does not at present have a generational gc. It has a stop-and-copy gc for use when there's lots of memory/swap space available and a shuffling Morris gc for use when there isn't, or paging is too slow. It also allows the heap to be "locked": e.g. if you have compiled a very large suite of programs, lock the heap after compilation, and then only a small amount of store will need to be copied on each GC, though all still has to be scanned. (This, I believe, produces the advantages of a generational GC, except that it isn't automated.) Poplog's heap can have "holes" in it for items allocated by external programs linked in (e.g. C, Fortran). It can also store non-relocatable Poplog data in such holes, e.g. for use in connection with external programs. > ........ Also, I'm unclear about the > system you're running -- I don't know much about Pop. > > The programs you're testing appear not to stress a gc much. It's hard to stress the Poplog GC as it is very fast. So only frequent GC's cause problems, and, as I said, you can do things to reduce frequency, like expanding heap space. > ... This is > likely because they're short and they don't ever have much live > data. One reason I chose compilation for part of the test is that the amount of "live" data steadily increases, at least with Poplog's incremental compiler, which allocates procedures in heap space, along with everything else. During compilation of the Prolog subsystem of Poplog, GC time was about 3 per cent. When compiling the Common Lisp extension, which increases the used heap area by a megabyte, the GC time was about 3.7 percent. Both percentages can be decreased by allocating more heap space, and/or by locking the heap after each file is compiled. > Running a small program several times is NOT at all like running a > big program once. Such experiments can be *very* misleadingly optimistic. I agree, but part of my point was also that for a given program you can vary the percentage of GC time by setting a small or large limit for the heap size. By allocating a VERY large heap (system memory and swap space permitting) I can reduce the frequency of garbage collections and thereby substantially reduce the percentage of time spent on gc. So statistics are more or less useless without full information about such variables. > Running programs that do a lot of number consing is particularly bad, > because those objects are typically short-lived and will flatter > your gc efficiency. (Unless you store pointers to the heap-allocated > numbers into objects in an older generation. Then your performance > can get bad fast.) Alternatively you have very little spare heap space and therefore have to reclaim space FREQUENTLY, in which case the performance will also get bad. > .... In real long-running programs, the amount of live data > *is* typically larger, sometimes dramatically larger, and the copying > cost goes up significantly. I guess what's real, or typical, is a matter of style, which is why benchmarking is so difficult. I do a lot of real work in the poplog editor - reading and sending mail, editing, developing programs, examining online documentation, browsing through news bulletins, running a Pop-11 based troff previewer, etc. etc. and many of the files I regularly read into the editor for browsing and editing are a few hundred kilobytes or more in size. The editor uses a vector of strings for a file, and while you work on a line, the corresponding string will be discarded every time it needs to grow, and every time you move off a string with wasted space to edit another line. So garbage is being created all the time. Yet with a big heap you can work for ages without invoking a GC and if you don't actually trace GC's you won't notice them because they are so fast (except on an overloaded shared machine). > (But maybe Pop-11 doesn't have a generational gc. If not, I guess > it doesn't have a lot of heap-resident system stuff like big editors, > browsers and compilers; that would increase the pause times.) No, it doesn't have a generational gc, and most of the core system stuff, like the editor, compilers, utilities, etc are not in the heap. However, there are large libraries of utilities, which get added to the heap if used. (On VMS, SunOS, and Dynix, saved images can be created in which user code and date are in a shared, non garbage collectable, segment.) Anyhow, it appears, from what you say, that generational GCs have a serious performance penalty, and that it may be preferable to have a very fast, infrequently invoked, alternative mechanism. Apologies for letting this grow too long. It's all that heap size available.... Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons@cogs.sussex.ac.uk aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk