Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!pasteur!agate!shelby!neon!carcoar!wilson From: wilson@carcoar.Stanford.EDU (Paul Wilson) Newsgroups: comp.object Subject: Re: [generational] Garbage collection -- difficulty, cost, generality Keywords: garbage collection times, generational gc Message-ID: <1990Mar21.005340.3044@Neon.Stanford.EDU> Date: 21 Mar 90 00:53:40 GMT References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <1990Mar15.231331.29545@Neon.Stanford.EDU> <2372@syma.sussex.ac.uk> Sender: news@Neon.Stanford.EDU (USENET News System) Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson) Organization: U. of Illinois at Chicago (UIC, *not* UofC or UIUC) Lines: 198 In article <2372@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) writes: >wilson@carcoar.Stanford.EDU (Paul Wilson) writes: > >> Date: 15 Mar 90 23:13:31 GMT >> >> In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) >> writes: >> > I (Paul Wilson) write: >> > >> >> ...................Don't expect a fast language on stock >> >> hardware to have gc speed costs of less than about 10% on average. >> >> I think in the 8-15 percent range myself. >> > .......... > >> >(Sloman:) >> >I thought 8-15 percept was rather high, so I thought I would do some >> >experiments in Pop-11 (a lisp like language with a Pascal-like >> >syntax) in the Poplog system on a Sun4/260 (stock hardware I think). >> > >> > [ ...description of programs run, short resulting gc times...] >> >> (Wilson:) >> Two points: I probably didn't make it clear in my posting that I was >> referring to *generational* gc's. > >I did not realize that was your intention. Poplog does not at present >have a generational gc. It has a stop-and-copy gc for use when there's >lots of memory/swap space available and a shuffling Morris gc for use >when there isn't, or paging is too slow. It also allows the heap to be >"locked": e.g. if you have compiled a very large suite of programs, lock >the heap after compilation, and then only a small amount of store will >need to be copied on each GC, though all still has to be scanned. (This, >I believe, produces the advantages of a generational GC, except that it >isn't automated.) This is a reasonable intermediate strategy, between a simple gc and a generational one. Since scanning is several times faster than copying (as long as no paging is involved), it can be a win. It avoids the generation checking and/or marking of a fully generational gc, but still gives you part of the benefit. For a large system with large heap images (either system goop or application data), though, it will not do well. Scanning a many-megabyte image can thrash the system at every gc. >> ........ Also, I'm unclear about the >> system you're running -- I don't know much about Pop. >> >> The programs you're testing appear not to stress a gc much. > >It's hard to stress the Poplog GC as it is very fast. So only >frequent GC's cause problems, and, as I said, you can do things >to reduce frequency, like expanding heap space. But memory is not free. In a high-performance native-code-compiled system, your performance will generally drop off quickly at the point where you're allocating and scavenging more virtual memory than you can keep in physical memory. This is becoming more severe all the time, as processor speed gains continue to outstrip disk speed gains. Decreasing memory costs tend to counter this trend, but it's unclear what the bottom line is. (And the various trends have different impacts on different users.) >> ... This is >> likely because they're short and they don't ever have much live >> data. > >One reason I chose compilation for part of the test is that the amount >of "live" data steadily increases, at least with Poplog's incremental >compiler, which allocates procedures in heap space, along with >everything else. During compilation of the Prolog subsystem of Poplog, >GC time was about 3 per cent. When compiling the Common Lisp extension, >which increases the used heap area by a megabyte, the GC time was about >3.7 percent. Both percentages can be decreased by allocating more heap >space, and/or by locking the heap after each file is compiled. >> Running a small program several times is NOT at all like running a >> big program once. Such experiments can be *very* misleadingly optimistic. > >I agree, but part of my point was also that for a given program you can >vary the percentage of GC time by setting a small or large limit for the >heap size. By allocating a VERY large heap (system memory and swap space >permitting) I can reduce the frequency of garbage collections and >thereby substantially reduce the percentage of time spent on gc. Until you outrun physical memory and start paging. You have to trade off increased copying for increased locality. The interaction between heap sizes and gc efficiency is important, but can't be viewed in isolation. And a simple compiler may not be a good test for two reasons. As I said previously, the average amount of live intermediate data may be small, when compared to an optimizing compiler. But in your case, it would appear that the dominant cost would be the repeated copying of the "results" (the already- created procedure objects, etc.) while creating more results. This could get much worse, too. An optimizing compiler may do several times as much computation, allocating lots of space and forcing several times as many gc's. So the dominant cost may be increased by several times. >So statistics are more or less useless without full information about >such variables. Right. You need to know when to bill the gc for whatever page faults occur, rather than just measuring pause times. There's no simple answer, because it depends on how much memory you have, and how much of that you had to buy precisely because of the characteristics of the system you're measuring. >> Running programs that do a lot of number consing is particularly bad, >> because those objects are typically short-lived and will flatter >> your gc efficiency. (Unless you store pointers to the heap-allocated >> numbers into objects in an older generation. Then your performance >> can get bad fast.) >Alternatively you have very little spare heap space and therefore >have to reclaim space FREQUENTLY, in which case the performance will >also get bad. Right. A generational gc will tend to compensate somewhat for short- lived numbers, as long as you don't store pointers to them into old objects. But a non-generational gc will simply copy (or scan) all the data that much more often. (Though serious number consing may make the program enough slower than a well-compiled program that it will flatter the gc significantly more. Less number consing tends to go hand-in-hand with faster code, increasing the relative importance of the *rest* of the gc's costs... yikes!) >> .... In real long-running programs, the amount of live data >> *is* typically larger, sometimes dramatically larger, and the copying >> cost goes up significantly. >I guess what's real, or typical, is a matter of style, which is why >benchmarking is so difficult. I do a lot of real work in the poplog >editor - reading and sending mail, editing, developing programs, >examining online documentation, browsing through news bulletins, running >a Pop-11 based troff previewer, etc. etc. and many of the files I >regularly read into the editor for browsing and editing are a few >hundred kilobytes or more in size. The editor uses a vector of strings >for a file, and while you work on a line, the corresponding string will >be discarded every time it needs to grow, and every time you move off a >string with wasted space to edit another line. So garbage is being >created all the time. Yet with a big heap you can work for ages without >invoking a GC and if you don't actually trace GC's you won't notice them >because they are so fast (except on an overloaded shared machine). A lot of the (mostly interactive) things you mention *do* account for a lot of peoples' primary usage patterns. But for most of these things, raw general computation speed is likely not to be the limiting factor either -- the continual overhead of a generational gc probably won't be a big deal. You're more likely to be concerned with the speed of graphics & disk operations, etc. And it will allow you a more integrated system, with utilities implemented in the language and resident in the heap. >> (But maybe Pop-11 doesn't have a generational gc. If not, I guess >> it doesn't have a lot of heap-resident system stuff like big editors, >> browsers and compilers; that would increase the pause times.) > >No, it doesn't have a generational gc, and most of the core system >stuff, like the editor, compilers, utilities, etc are not in the >heap. However, there are large libraries of utilities, which get >added to the heap if used. (On VMS, SunOS, and Dynix, saved images >can be created in which user code and date are in a shared, non >garbage collectable, segment.) > >Anyhow, it appears, from what you say, that generational GCs have a >serious performance penalty, and that it may be preferable to have a >very fast, infrequently invoked, alternative mechanism. Right -- if you're willing to live with a less congenial development environment in creating the system. (You give up writing almost everything in the same language, with all of your debugging tools available.) And for a lot of people, it's still not going to work well. If your applications have megabytes of live data, or if they include large, heap-resident libraries, you're going to run into annoying gc pauses and overhead, or thrash. My own preference is for a generational gc, at least in a general development environment; its performance may have a higher baseline overhead, but it's more robust in the face of big hairy programs. You trade a little processor overhead for a lot of locality, and save significant money on memory. For a system that generates stand-alone programs, though, you'd like to have the option of specifying which gc strategy it should use, and have three choices. A simple gc will give the fastest performance for programs with little data live at any given time; a generational gc will be better for programs with lots of live data, and a fine-grained incremental gc will be necessary for difficult real-time applications. (The performance will be degraded significantly, but response times will be reliably predictable.) -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680