Xref: utzoo comp.arch:14538 comp.lang.lisp:2899 comp.lang.smalltalk:1757 comp.lang.misc:4398 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!decwrl!shelby!neon!carcoar!wilson From: wilson@carcoar.Stanford.EDU (Paul Wilson) Newsgroups: comp.arch,comp.lang.lisp,comp.lang.smalltalk,comp.lang.misc Subject: why tagged immediate floats are good Summary: heap-allocated floats degrade generational GC performance Keywords: garbage collection, generational garbage collection, floating point Message-ID: <1990Mar10.234555.150@Neon.Stanford.EDU> Date: 10 Mar 90 23:45:55 GMT Sender: root@Neon.Stanford.EDU (System PRIVILEGED Account) Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson) Organization: U. of Illinois at Chicago (UIC, *not* UofC or UIUC) Lines: 103 I've gotten a lot of mail about *how* to do tagged immediate floats, but also some asking *why* it's important. A couple of people thought that a generational gc should *alleviate* the problem of heap-allocated floats considerably; they wanted to know why I thought the opposite. It turns out that heap-allocated floats can be a BIG problem for a generational gc because they can violate the assumptions that the generational strategy depends on. Generational gc's assume that most objects are short-lived, and that few pointers are created that point from an object in an older generation to an object in a younger generation. Heap-allocating float values can violate this assumption because a lot of unnecessary objects are created *and* pointers to those objects are often stored into older objects. Consider the case of updating every element of a million-element (4-megabyte) array. (Perhaps you're incrementing each element by the corresponding element of another array, or whatever.) In this case, you allocate several words on the heap for each element of your four-megabyte array -- probably 8 megabytes or so. This will cause your youngest generation to fill and be scavenged several times, unless it is huge. Not only will you perform several extra scavenges, but all of those objects that you created will survive the scavenges, because they are pointed to from the array in older memory. That means you not only allocated them, but you copy them at least once, to get them out of the youngest generation. So for each float value you update, you spend a handful of instructions to heap allocate it and at least another handful to copy what you allocated. And things can get worse. A generational gc must keep track of all pointers from older generations into younger ones. (This is necessary to ensure that all live objects in the younger generations can be found at scavenge time, and all pointers to them can be updated.) If you use the simple strategy of keeping a list ("remembered set") of all of the objects you've stored such pointers into, you can run into trouble. In the case of updating our million-element array, we'll have to scan the whole array at every scavenge that the extra allocation forces. That means scanning four megabytes every time we allocate up through the youngest generation (probably a quarter or a half megabyte of allocation.) This not only increases processing overhead, but is likely to cause thrashing. Tektronix Smalltalk's Large Object Space alleviates this problem for many programs by managing large objects specially. (See Caudill & Wirfs-Brock's OOPSLA 86 paper and Ungar and Jackson's OOPSLA 88 paper.) My gc is also less sensitive to this sort of thing because it keeps dirty bits for small fixed-size areas of memory ("cards") rather than tracking objects. (See my OOPSLA 89 paper.) But neither of these tricks always works. The large object space doesn't work if you have a large data structure made up of small objects, which gets promoted into an older generation and then a lot of the objects get updated. (This is a variant of the "pig-in-the-snake" problem, to use Jon L White's term.) And in a really bad case, card marking may not help either -- your cards may not be smaller than your objects. So really bad locality of writes can be bad news. Note that a smart compiler, which keeps intermediate floating-point values in registers or on a special stack (a la Maclisp or S-1 Lisp) will *not* help a lot with this problem. The floats that cause this problem are not the intermediate values in an expression, they're the results that get stored back into the heap. The upshot is that it's preferable to use immediate values when you can, rather than creating heap objects that may violate the assumptions of the gc. If you can update a float value in place, do it. This seems to me particularly important because float-intensive programs often *do* use lots of them in *big* data structures. One tried-and-true way to punt on this at the language level is to provide primitive homogeneous arrays. (e.g., array-of-double) But that requires that the user do un-Lispy or un-Smalltalky things -- writing FORTRAN-like programs. And it won't work for arbitrary non-array objects that just happen to have a float in some slot. If a user just uses a float because something is conceptually a real number, but doesn't need really high precision and fast math, it shouldn't choke the gc. So my philosophy is to provide short immediate floats for such "casual" purposes. Then the issue becomes one of whether to sacrifice a little precision to make room for a tag, or to sacrifice a little speed to allow compacting most of them without loss of precision. -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680