Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!hplabs!hpl-opus!walker From: walker@hpl-opus.HP.COM (Rick Walker) Newsgroups: comp.editors Subject: Re: What is buffer gap you ask? Message-ID: <62420008@hpl-opus.HP.COM> Date: 4 Jan 90 03:03:39 GMT References: <129846@sun.Eng.Sun.COM> Organization: HP Labs, High Speed Electronics Dept., Palo Alto, CA Lines: 85 / hpl-opus:comp.editors / jpayne%flam@Sun.COM (Jonathan Payne) / 11:41 am Jan 3, 1990 / > Buffer gap is just one way to store a bunch of characters you wish to > edit. (Another way is having a linked list of lines.) Emacs uses the > buffer gap mechanism, and here is what a buffer looks like in emacs. > > |<----- first half -----><----- gap -----><------ second half ------>| > ... (lots of explanation deleted...) > Anyway, so I hope this helps people understand buffer gap. I can't > imagine writing an editor any other way, even though it does have it's > disadvantages. I was also convinced that the buffer gap was the best way to write an editor until I read "RED, a better C screen editor" by Edward K. Ream in Dr. Dobbs Journal, Number 81, July 1983. The buffer routine describe therein uses virtual memory techniques and caching to speed up buffer access. Basically the edited file is read into a sequence of blocks (~1-2k in size). Each block has a pointer to the disk address of both the previous and next block of the file (ie, doubly linked), a dirty bit, and a count of how many characters are stored in the block, and some number of characters from the file. When inserting new text into a file, it is put in the current block if there remains room. If it will not fit, a new block is allocated, linked into the structure, and filled with the new data. Similiarly, when deleting, data blocks are merged or freed as needed and the empty disk blocks are added to the free list. Blocks which are modified have their dirty bit set. When browsing sequentially through a file, new blocks are read in as needed and stored in memory. The block slots in memory are cached in a Least Recently Used (LRU) fashion. This means that scrolling up and down in a file over a small range will result in zero disk accesses because all the file info is held in the block cache. When it is necessary to flush a block to have space to read in a new block, the least recently used block is replaced by the newly requested block. It is at this point that there can be a big performance improvement over the buffer gap method. In the buffer gap approach, data pulled from the top of the gap must always be copied to the bottom of the gap. This involves alot of slow copying if the buffer gap file is on disk. In the RED system, only modified pages must actually be written back to disk. If the page is not modified (ie, the dirty bit is not set on the page), then it is merely discarded from the cache when new space is needed. Simple sequential browsing through the file is therefore twice as fast as the buffer gap method (no disc writes required). If you do alot of scanning of your file (regexp searches, etc), the buffer gap approach will be twice as slow as the RED method. If initialization of the later tempfile blocks is deferred until they are actually needed, then the start up time for the editor is negligable. For example, one could initialize only the first couple of buffer blocks, start the editor with the cursor on line 1, and only read in later blocks when the user scrolled up to that point in the file. You may argue (correctly) that if the buffer gap algorithm is implemented on a virtual memory machine that all the VM stuff is handled by the operating system. The price you pay is that you have a large core image to be swapped around (have you ever heard anyone complain about the size of EMACS?). Another gotcha is that every sweep through the file in the buffer gap method modifies every single page of buffer memory. If the file is bigger than memory, and therefore must be paged, the gap method therefore forces the operating system to write all the pages. This makes very poor use of the disk swap system. RED on the other hand only modifies pages when necessary and can usually avoid having to write out stale pages. The RED method allows a configurable and continuous trade-off of core size vs cache miss ratio. If desired, a very small but usable editor can easily be configured. Finally, the buffer gap editor is not portable to non-VM systems without putting restrictions on the size of file that can be edited. In particular, files must fit in the available RAM instead of being limited to available disk size. Cheers, Rick Walker -- Rick Walker !hplabs!walker