Xref: utzoo comp.editors:1220 gnu.emacs:2074 comp.unix.wizards:19983
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!uc!uh!fin
From: fin@uh.msc.umn.edu (Craig Finseth)
Newsgroups: comp.editors,gnu.emacs,comp.unix.wizards
Subject: Re: GNU Emacs, memory usage, releasing
Keywords: GNU emacs malloc memory working set gap editor
Message-ID: <1025@uc.msc.umn.edu>
Date: 3 Jan 90 22:00:09 GMT
References: <1558@aber-cs.UUCP>
Sender: news@uc.msc.umn.edu
Reply-To: fin@uh.UUCP (Craig Finseth)
Organization: Minnesota Supercomputer Center, Minneapolis, MN
Lines: 156

As the author of "The Emacs Cookbook" (actual title: "Theory and
Practise of Text Editing --or-- A Cookbook for an Emacs", MIT
Laboratory for Computer Science (LCS) Technical Memo 165, May 1980 (it
was my B.S. thesis)), I feel that I can make a useful contribution to
this discussion.

Linked line vs. buffer gap: I mention in pages 33+ that of the two,
buffer gap is the preferred technology for paged virtual memory
systems.

	As a theoretical point, you're always going to be in trouble
	if your buffer size is larger than your (allowed by the o.s.)
	working set size.  I contend that you are both in *less*
	trouble and the trouble point is moved up in a buffer gap
	system.

Tim Bray makes an exellent point:

	"Hold on.  What makes you think you know what the problem is?  Have you
	instrumented it with the profiler and quantified the overhead of this
	"problem"?  My own intuition is that GNUmacs spends much more time
	CONSing and otherwise chugging through the innards of the Lisp
	interpreter.  But I wouldn't bet $0.50 on my intuition unless I had a
	good quantitative understanding of what's going on."

Aside from a few pathological cases (RMAIL being a notable one), I
would be surprized if gap motion was as significant factor on general
editing operations.

Editing is such a general activity, and GNU-Emacs is used for such a
wide variety of purposes, that it would be impractical to optimize its
performance in all cases.  Instead, the trick (as in all programming)
is to identify the frequent cases and optimize them.  In particular, I
consider editing small files and *viewing* large files to both be
frequent: editing a large file (especially the "insert at beginning /
insert at end / insert at beginning" case) is comparatively rare.


Piercarlo Grandi then presented some data.  I took the liberty of
combining the user and system times (as we mainly care about wall
clock time, not who is paying $$$) and figuring the incremental times
for each operation:

OPERATION	Emacs 18.54	MicroGNU 2a	Jove 4.12

		total	inc	total	inc	total	inc

1)  startup	1.55	1.55	0.12	0.12	0.79	0.79
2)  C-X C-F	2.47	0.92	1.15	1.03	1.84	1.05
3)  C-X C-Q	2.62	0.15	1.21	0.06	1.86	0.02
4)  SP		2.88	0.16	1.22	0.01	1.86	0.00
5)  M->		3.10	0.22	1.31	0.09	1.93	0.07
6)  SP		3.29	0.19	1.31	0.00	1.93	0.00
7)  SP		3.41	0.12	1.31	0.00	1.93	0.00
8)  M-<		3.99	0.58	1.38	0.07	2.05	0.12
9)  SP		4.35	0.36	1.57	0.19	2.05	0.00

Comments on Piercarlo's comments:

	1) Is just to invoke an empty editor. GNU Emacs pays dearly for its
	size and complication here.

	2) Reads 80KB in. GNU is relativley quick, it just copies the file
	to memory. MicroGnu reads it into memory and threads it into a list
	of lines, and Jove copies it to a temporary file and threads
	into a list the offsets of lines in this file.

If "GNU is relativley quick" and GNU takes twice about as long as the
others, what is the definition of "quickness"?

	3) This is only to remove the RO property from the buffer.

	4) Insert a space at the beginning of buffer. GNU has to move
	the gap, which is initially at the end of memory, and pays a
	lot here.  The other two take less than a centisecond, GNU
	seven. This is 70 milliseconds for moving 80KB, or somewhat
	more than a MB per second on a 4 MIPS machine. Note the
	relatively high system time for Emacs. Thi is due mostly to
	redisplay.

Actually, it takes GNU about the same time to do this as to clear the
RO property.  Moving the gap is thus probably not the major problem
here.

	5) Go to the end of buffer. This involves no gap movement,
	just redisplay.  System times show it. On my 386 the frame
	buffer (an Hercules clone) is abjectly slow, and the device
	driver for it correspondinglu clocks up system time.

Again, this takes about 50% longer than 4, with no gap motion
involved.  I would start to suspect redisplay as the real culprit...

	6) Insert as space as the last chracter. This moves the gap again, and
	it shows. Also redisplay.

Now we have a drop in time with the gap motion.  Less redisplay, too.
Again, I would really focus on the redisplay time and ignore the gap
motion time (unless it is trivial to speed up).

	7) Add a second space at the end. Just redisplay really, and
	minimal as to that.

	8) Go back to the beginning of buffer.

	9) insert another space there.

8 and 9 further confirm that gap motion is not the major problem WHEN
CONSIDERING THE SYSTEM AS A WHOLE.

From the same message:

	"The lesson I derive from these timings is that creating the
	linked list of lines, and especially copying the file to a
	temporary as well, slow down file reading time, but then
	further operations become very much faster. Note also that
	both MicrGnu and Jove are somewhat carelessly coded, with lots
	of quadratic algorithms."

I claim that the data does not support this conclusion.  This does not
mean that the conclusion is incorrect.  Rather, the data supports the
conclusion that GNU-Emacs' redisplay is slow on the specified computer.
This ananlysis parallels that supplied by Ian Dall.


Jonathan Payne supplied an excellent summary of how a buffer gap
editor works and the problems with redisplay.  As it turns out, even
basic redisplay is a hard problem (Knuth "The Art of Computer
Programming" level 40).  Doing a full redisplay correctly (handling
variable-width characters, unlimited line lengths, multiple windows,
etc.) is HARD (Knuth level 50).


Some Historical Notes:

Early drafts of the thesis had a chapter "proving" that only a
mainframe-type computer could run a full Emacs-type editor.  One
brilliant insight (:-) later, the chapter was cut and a software
company was formed (Mark of the Unicorn).  The Mince text editor was
not interactively extensible, but was otherwise a full Emacs running
on a 48K CP/M system with small floppy disks.

The scheme that	was used is called paged buffer gap and it is briefly
mentioned on page 36 of the thesis.  The basic idea is that a file is
represented as an array of pages.  Each ~1K page is a separate buffer
gap system.  This representation was very efficient for the
environment, especially as it formed a natural basis for the paged
virtual memory environment required to edit files larger than will fit
in memory.

I contend that in a "modern workstation environment" (e.g., Sun 3/60),
a simple buffer gap method is preferred over both paged buffer gap and
linked line.  I leave it as an excercise for the reader to figure out
why.

Craig A. Finseth			fin@msc.umn.edu [CAF13]
Minnesota Supercomputer Center, Inc.	+1 612 624 3375