Path: utzoo!attcan!uunet!lll-winken!lll-ncis!helios.ee.lbl.gov!pasteur!ucbvax!agate!bionet!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!uxg.cso.uiuc.edu!uicbert.eecs.uic.edu!wilson From: wilson@uicbert.eecs.uic.edu Newsgroups: comp.lang.lisp Subject: Re: big programs? data use? Message-ID: <63200007@uicbert.eecs.uic.edu> Date: 14 Jan 89 06:55:00 GMT References: <63200006@uicbert.eecs.uic.edu> Lines: 55 Nf-ID: #R:uicbert.eecs.uic.edu:63200006:uicbert.eecs.uic.edu:63200007:000:2718 Nf-From: uicbert.eecs.uic.edu!wilson Jan 14 00:55:00 1989 I've been wondering about this issue, both for KB simulation and for various other kinds of AI programs. The responses to my original posting *seem* to be pretty consistent with what I'd been thinking: *most* people don't often have a lot of live program data. A few people have a tremendous amount, though. There are problems with taking this unscientific poll seriously -- I have no idea how representative the responses are, and I don't know which is the chicken and which the egg. Would people use enormous amounts of memory if it cost significantly less. Maybe a lot of people need a lot of memory and they're all using FORTRAN, but will all switch to Common Lisp soon. Who knows? Actually, for my purposes, huge amounts of data are ok if access patterns are reasonably uneven. That is, if there's decent locality of reference and particularly of *changes* to old data. I'm working on virtual copy mechanisms coordinated with garbage collection, and design tradeoffs depend on whether modified locations in older generations are typically pretty close to each other. (This is also important for garbage collectors that scan dirty pages of old memory to find all of the pointers into new memory.) Can anybody who has a program that keeps a lot of live data around tell me whether much of it is altered, and especially whether much is altered very long after it's created? I'm particularly interested in locality in AI systems like RETE matchers, and in simulation languages. I don't know either of them well enough to guess. My understanding is that in Prolog systems, most changed fields are in an activation stack or near to it. I'm pretty firmly convinced that most programs that have a huge amount of data don't ever modify most of it, but instead just keep it sitting around (and search through it) until it becomes garbage. In the extreme, this is trivially true because if a program creates objects fast enough, it won't have any time to modify them. Short of that extreme, I'm trying to get a handle on typical distributions of changes among data objects (and especially clumps of data objects allocated around the same times). Does anybody have any counterexamples? That is, programs that generate a lot of data that lives quite a while AND which go back through that data making widely-distributed changes now and then? Maybe an enormous cellular automaton or something? Do normal simulation programs often have such nasty tendencies? Any impressions are welcome. -- Paul Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 413-0042 U. of Ill. at Chi. EECS Dept. (M/C 154) wilson@uicbert.eecs.uic.edu Box 4348 Chicago,IL 60680