Path: utzoo!attcan!uunet!lll-winken!lll-ncis!helios.ee.lbl.gov!pasteur!ucbvax!agate!bionet!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!uxg.cso.uiuc.edu!uicbert.eecs.uic.edu!wilson
From: wilson@uicbert.eecs.uic.edu
Newsgroups: comp.lang.lisp
Subject: Re: big programs?  data use?
Message-ID: <63200007@uicbert.eecs.uic.edu>
Date: 14 Jan 89 06:55:00 GMT
References: <63200006@uicbert.eecs.uic.edu>
Lines: 55
Nf-ID: #R:uicbert.eecs.uic.edu:63200006:uicbert.eecs.uic.edu:63200007:000:2718
Nf-From: uicbert.eecs.uic.edu!wilson    Jan 14 00:55:00 1989


I've been wondering about this issue, both for KB simulation and for
various other kinds of AI programs.

The responses to my original posting *seem* to be pretty consistent
with what I'd been thinking:  *most* people don't often have a lot
of live program data.  A few people have a tremendous amount, though.

There are problems with taking this unscientific poll seriously --
I have no idea how representative the responses are, and I don't
know which is the chicken and which the egg.  Would people use
enormous amounts of memory if it cost significantly less. Maybe a
lot of people need a lot of memory and they're all using FORTRAN,
but will all switch to Common Lisp soon.  Who knows? 

Actually, for my purposes, huge amounts of data are ok if access patterns
are reasonably uneven.  That is, if there's decent locality of reference
and particularly of *changes* to old data.  I'm working on virtual
copy mechanisms coordinated with garbage collection, and design
tradeoffs depend on whether modified locations in older generations
are typically pretty close to each other.  (This is also important
for garbage collectors that scan dirty pages of old memory to find
all of the pointers into new memory.)

Can anybody who has a program that keeps a lot of live data around
tell me whether much of it is altered, and especially whether much
is altered very long after it's created?

I'm particularly interested in locality in AI systems like RETE matchers,
and in simulation languages.  I don't know either of them well enough
to guess.  My understanding is that in Prolog systems, most changed
fields are in an activation stack or near to it.

I'm pretty firmly convinced that most programs that have a huge amount
of data don't ever modify most of it, but instead just keep it sitting
around (and search through it) until it becomes garbage.  In the extreme,
this is trivially true because if a program creates objects fast
enough, it won't have any time to modify them.  Short of that extreme,
I'm trying to get a handle on typical distributions of changes among
data objects (and especially clumps of data objects allocated around
the same times).

Does anybody have any counterexamples?  That is, programs that generate
a lot of data that lives quite a while AND which go back through that
data making widely-distributed changes now and then?  Maybe an enormous
cellular automaton or something?  Do normal simulation programs often
have such nasty tendencies?  Any impressions are welcome.

   -- Paul


Paul R. Wilson                         
Human-Computer Interaction Laboratory    lab ph.: (312) 413-0042
U. of Ill. at Chi. EECS Dept. (M/C 154)  wilson@uicbert.eecs.uic.edu
Box 4348   Chicago,IL 60680