Xref: utzoo comp.editors:1200 gnu.emacs:2056 comp.unix.wizards:19955
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!think!think.com
From: rlk@think.com (Robert Krawitz)
Newsgroups: comp.editors,gnu.emacs,comp.unix.wizards
Subject: Re: GNU Emacs, memory usage, releasing
Keywords: GNU emacs malloc memory working set gap editor
Message-ID: <32534@news.Think.COM>
Date: 31 Dec 89 18:53:27 GMT
References: <1558@aber-cs.UUCP>
Sender: news@Think.COM
Reply-To: rlk@think.com (Robert Krawitz)
Followup-To: comp.editors
Organization: Thinking Machines Corp., Cambridge MA
Lines: 53
cc: rlk
In-reply-to: pcg@aber-cs.UUCP (Piercarlo Grandi)

Very interesting note.  It explains a large number of observations I
have made over the years (some of them I was aware of long before
reading your note, as I did a lot of work on rmail around 1985, but this
ties a lot of stuff together).

1)  Rmail is very slow when getting new mail from an inbox.  I was aware
of this very early, and I understood why (the gap).  Rmail normally has
to convert each message to babyl format by making a few small edits on
each message.  When I worked with pmd (personal mail daemon), I put in
code to permit mailboxes to be written in rmail format, thereby not
requiring any conversion to be done.  This speeds up emacs
substantially.  However, certain operations (such as computing a summary
buffer) are still slow.  This is in part because rmail writes the
summary line into the message header (to cache it for future use).  I
was never in favor of this, but I never thought too hard about the fact
that it edits in the same pattern.

BTW, a favorite benchmark of mine involves the following:  converting a
large number of messages (1000, say) to babyl format, and deleting and
expunging these same 1000 messages.  The messages are deliberately kept
small (a very small header and one line of body) to minimize paging
effects.  My experience was that the early IBM RT (which was otherwise a
real dog) could keep up with a Microvax II on this test, and that in
general RISC machines do extremely well on this test (they run emacs
very well in general, as it happens).

2)  Emacs dies very quickly after its virtual size exceeds 16 Mbytes,
due to the 24 bit pointers used (the top 8 bits are used as tag bits for
the Lisp interpreter).  I have frequently noticed that killing off old
buffers does not permit me to prolong the life of my emacs session, and
that an emacs with a Lisp buffer (which grows rapidly but erratically)
tends to run out of room quickly.  This I assume is due to the constant
realloc'ing going on.

I don't necessarily agree that the issue is design for virtual memory
vs. swapping, by the way.  There is a general problem in emacs with a
lot of things being scaled poorly, or otherwise underdesigned.  For
example, the 24 bit limit on integers (23 bit signed integers in lisp,
24 bit pointers internally), the inexplicable and seemingly gratuitous
divergences from common lisp, etc.  The 24 bit integer/pointer problem
worried me even in 1985, but RMS wasn't too interested in hearing about
it.  The problem is only really showing up now (for example, my
Sparcstation has 16 MB of physical memory and 100 MB swap, and I run big
emacs processes).  Judging by your comments, the memory management
scheme was similarly unplanned.  I don't think it was designed with
swapping systems in mind, I simply don't think it was designed to any
great degree.  A real pity, since no other Unix editor shows any more
design.  I wish it had been done right in the first place.  It's not
clear to me that any of this will ever be fixed.
-- 
ames >>>>>>>>>  |	Robert Krawitz <rlk@think.com>	245 First St.
bloom-beacon >  |think!rlk	(postmaster)		Cambridge, MA  02142
harvard >>>>>>  .	Thinking Machines Corp.		(617)876-1111