Xref: utzoo comp.arch:14538 comp.lang.lisp:2899 comp.lang.smalltalk:1757 comp.lang.misc:4398
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!decwrl!shelby!neon!carcoar!wilson
From: wilson@carcoar.Stanford.EDU (Paul Wilson)
Newsgroups: comp.arch,comp.lang.lisp,comp.lang.smalltalk,comp.lang.misc
Subject: why tagged immediate floats are good
Summary: heap-allocated floats degrade generational GC performance
Keywords: garbage collection, generational garbage collection, floating point
Message-ID: <1990Mar10.234555.150@Neon.Stanford.EDU>
Date: 10 Mar 90 23:45:55 GMT
Sender: root@Neon.Stanford.EDU (System PRIVILEGED Account)
Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson)
Organization: U. of Illinois at Chicago (UIC, *not* UofC or UIUC)
Lines: 103

I've gotten a lot of mail about *how* to do tagged immediate floats, but
also some asking *why* it's important.  A couple of people thought
that a generational gc should *alleviate* the problem of heap-allocated 
floats considerably;  they wanted to know why I thought the opposite.

It turns out that heap-allocated floats can be a BIG problem for a 
generational gc because they can violate the assumptions that the 
generational strategy depends on.

Generational gc's assume that most objects are short-lived, and that few
pointers are created that point from an object in an older generation
to an object in a younger generation.  Heap-allocating float values
can violate this assumption because a lot of unnecessary objects are
created *and* pointers to those objects are often stored into older
objects.

Consider the case of updating every element of a million-element 
(4-megabyte) array.  (Perhaps you're incrementing each element by 
the corresponding element of another array, or whatever.)

In this case, you allocate several words on the heap for each element
of your four-megabyte array -- probably 8 megabytes or so.  This will
cause your youngest generation to fill and be scavenged several times,
unless it is huge.

Not only will you perform several extra scavenges, but all of those
objects that you created will survive the scavenges, because they
are pointed to from the array in older memory.  That means you not
only allocated them, but you copy them at least once, to get
them out of the youngest generation.  So for each float value you
update, you spend a handful of instructions to heap allocate it
and at least another handful to copy what you allocated.


And things can get worse.  A generational gc must keep track of all
pointers from older generations into younger ones.  (This is
necessary to ensure that all live objects in the younger
generations can be found at scavenge time, and all pointers
to them can be updated.)

If you use the simple strategy of keeping a list ("remembered set") 
of all of the objects you've stored such pointers into, you can
run into trouble.  In the case of updating our million-element array, 
we'll have to scan the whole array at every scavenge that the extra
allocation forces.  That means scanning four megabytes every time
we allocate up through the youngest generation (probably a quarter
or a half megabyte of allocation.)  This not only increases processing
overhead, but is likely to cause thrashing.  

Tektronix Smalltalk's Large Object Space alleviates this problem
for many programs by managing large objects specially.  (See
Caudill & Wirfs-Brock's OOPSLA 86 paper and Ungar and Jackson's
OOPSLA 88 paper.)  My gc is also less sensitive to this sort
of thing because it keeps dirty bits for small fixed-size areas
of memory ("cards") rather than tracking objects.  (See my OOPSLA
89 paper.)

But neither of these tricks always works.  The large object space
doesn't work if you have a large data structure made up of small
objects, which gets promoted into an older generation and then
a lot of the objects get updated.  (This is a variant of the
"pig-in-the-snake" problem, to use Jon L White's term.)  And
in a really bad case, card marking may not help either -- your
cards may not be smaller than your objects.  So really bad
locality of writes can be bad news.

Note that a smart compiler, which keeps intermediate floating-point
values in registers or on a special stack (a la Maclisp or S-1 Lisp)
will *not* help a lot with this problem.  The floats that cause this
problem are not the intermediate values in an expression, they're
the results that get stored back into the heap. 


The upshot is that it's preferable to use immediate values when
you can, rather than creating heap objects that may violate the
assumptions of the gc.  If you can update a float value in
place, do it.

This seems to me particularly important because float-intensive
programs often *do* use lots of them in *big* data structures.

One tried-and-true way to punt on this at the language level
is to provide primitive homogeneous arrays. (e.g., array-of-double)

But that requires that the user do un-Lispy or un-Smalltalky
things -- writing FORTRAN-like programs.  And it won't work
for arbitrary non-array objects that just happen to have a float
in some slot.  If a user just uses a float because something
is conceptually a real number, but doesn't need really
high precision and fast math, it shouldn't choke the gc.

So my philosophy is to provide short immediate floats for
such "casual" purposes.  Then the issue becomes one of whether
to sacrifice a little precision to make room for a tag, or
to sacrifice a little speed to allow compacting most of them
without loss of precision.  

    -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680