Xref: utzoo comp.editors:1220 gnu.emacs:2074 comp.unix.wizards:19983 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!uc!uh!fin From: fin@uh.msc.umn.edu (Craig Finseth) Newsgroups: comp.editors,gnu.emacs,comp.unix.wizards Subject: Re: GNU Emacs, memory usage, releasing Keywords: GNU emacs malloc memory working set gap editor Message-ID: <1025@uc.msc.umn.edu> Date: 3 Jan 90 22:00:09 GMT References: <1558@aber-cs.UUCP> Sender: news@uc.msc.umn.edu Reply-To: fin@uh.UUCP (Craig Finseth) Organization: Minnesota Supercomputer Center, Minneapolis, MN Lines: 156 As the author of "The Emacs Cookbook" (actual title: "Theory and Practise of Text Editing --or-- A Cookbook for an Emacs", MIT Laboratory for Computer Science (LCS) Technical Memo 165, May 1980 (it was my B.S. thesis)), I feel that I can make a useful contribution to this discussion. Linked line vs. buffer gap: I mention in pages 33+ that of the two, buffer gap is the preferred technology for paged virtual memory systems. As a theoretical point, you're always going to be in trouble if your buffer size is larger than your (allowed by the o.s.) working set size. I contend that you are both in *less* trouble and the trouble point is moved up in a buffer gap system. Tim Bray makes an exellent point: "Hold on. What makes you think you know what the problem is? Have you instrumented it with the profiler and quantified the overhead of this "problem"? My own intuition is that GNUmacs spends much more time CONSing and otherwise chugging through the innards of the Lisp interpreter. But I wouldn't bet $0.50 on my intuition unless I had a good quantitative understanding of what's going on." Aside from a few pathological cases (RMAIL being a notable one), I would be surprized if gap motion was as significant factor on general editing operations. Editing is such a general activity, and GNU-Emacs is used for such a wide variety of purposes, that it would be impractical to optimize its performance in all cases. Instead, the trick (as in all programming) is to identify the frequent cases and optimize them. In particular, I consider editing small files and *viewing* large files to both be frequent: editing a large file (especially the "insert at beginning / insert at end / insert at beginning" case) is comparatively rare. Piercarlo Grandi then presented some data. I took the liberty of combining the user and system times (as we mainly care about wall clock time, not who is paying $$$) and figuring the incremental times for each operation: OPERATION Emacs 18.54 MicroGNU 2a Jove 4.12 total inc total inc total inc 1) startup 1.55 1.55 0.12 0.12 0.79 0.79 2) C-X C-F 2.47 0.92 1.15 1.03 1.84 1.05 3) C-X C-Q 2.62 0.15 1.21 0.06 1.86 0.02 4) SP 2.88 0.16 1.22 0.01 1.86 0.00 5) M-> 3.10 0.22 1.31 0.09 1.93 0.07 6) SP 3.29 0.19 1.31 0.00 1.93 0.00 7) SP 3.41 0.12 1.31 0.00 1.93 0.00 8) M-< 3.99 0.58 1.38 0.07 2.05 0.12 9) SP 4.35 0.36 1.57 0.19 2.05 0.00 Comments on Piercarlo's comments: 1) Is just to invoke an empty editor. GNU Emacs pays dearly for its size and complication here. 2) Reads 80KB in. GNU is relativley quick, it just copies the file to memory. MicroGnu reads it into memory and threads it into a list of lines, and Jove copies it to a temporary file and threads into a list the offsets of lines in this file. If "GNU is relativley quick" and GNU takes twice about as long as the others, what is the definition of "quickness"? 3) This is only to remove the RO property from the buffer. 4) Insert a space at the beginning of buffer. GNU has to move the gap, which is initially at the end of memory, and pays a lot here. The other two take less than a centisecond, GNU seven. This is 70 milliseconds for moving 80KB, or somewhat more than a MB per second on a 4 MIPS machine. Note the relatively high system time for Emacs. Thi is due mostly to redisplay. Actually, it takes GNU about the same time to do this as to clear the RO property. Moving the gap is thus probably not the major problem here. 5) Go to the end of buffer. This involves no gap movement, just redisplay. System times show it. On my 386 the frame buffer (an Hercules clone) is abjectly slow, and the device driver for it correspondinglu clocks up system time. Again, this takes about 50% longer than 4, with no gap motion involved. I would start to suspect redisplay as the real culprit... 6) Insert as space as the last chracter. This moves the gap again, and it shows. Also redisplay. Now we have a drop in time with the gap motion. Less redisplay, too. Again, I would really focus on the redisplay time and ignore the gap motion time (unless it is trivial to speed up). 7) Add a second space at the end. Just redisplay really, and minimal as to that. 8) Go back to the beginning of buffer. 9) insert another space there. 8 and 9 further confirm that gap motion is not the major problem WHEN CONSIDERING THE SYSTEM AS A WHOLE. From the same message: "The lesson I derive from these timings is that creating the linked list of lines, and especially copying the file to a temporary as well, slow down file reading time, but then further operations become very much faster. Note also that both MicrGnu and Jove are somewhat carelessly coded, with lots of quadratic algorithms." I claim that the data does not support this conclusion. This does not mean that the conclusion is incorrect. Rather, the data supports the conclusion that GNU-Emacs' redisplay is slow on the specified computer. This ananlysis parallels that supplied by Ian Dall. Jonathan Payne supplied an excellent summary of how a buffer gap editor works and the problems with redisplay. As it turns out, even basic redisplay is a hard problem (Knuth "The Art of Computer Programming" level 40). Doing a full redisplay correctly (handling variable-width characters, unlimited line lengths, multiple windows, etc.) is HARD (Knuth level 50). Some Historical Notes: Early drafts of the thesis had a chapter "proving" that only a mainframe-type computer could run a full Emacs-type editor. One brilliant insight (:-) later, the chapter was cut and a software company was formed (Mark of the Unicorn). The Mince text editor was not interactively extensible, but was otherwise a full Emacs running on a 48K CP/M system with small floppy disks. The scheme that was used is called paged buffer gap and it is briefly mentioned on page 36 of the thesis. The basic idea is that a file is represented as an array of pages. Each ~1K page is a separate buffer gap system. This representation was very efficient for the environment, especially as it formed a natural basis for the paged virtual memory environment required to edit files larger than will fit in memory. I contend that in a "modern workstation environment" (e.g., Sun 3/60), a simple buffer gap method is preferred over both paged buffer gap and linked line. I leave it as an excercise for the reader to figure out why. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375