Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!hplabs!hpl-opus!walker
From: walker@hpl-opus.HP.COM (Rick Walker)
Newsgroups: comp.editors
Subject: Re: What is buffer gap you ask?
Message-ID: <62420008@hpl-opus.HP.COM>
Date: 4 Jan 90 03:03:39 GMT
References: <129846@sun.Eng.Sun.COM>
Organization: HP Labs, High Speed Electronics Dept., Palo Alto, CA
Lines: 85

/ hpl-opus:comp.editors / jpayne%flam@Sun.COM (Jonathan Payne) / 11:41 am  Jan  3, 1990 /

> Buffer gap is just one way to store a bunch of characters you wish to
> edit.  (Another way is having a linked list of lines.)  Emacs uses the
> buffer gap mechanism, and here is what a buffer looks like in emacs.
> 
> |<----- first half -----><----- gap -----><------ second half ------>|
> 
  ... (lots of explanation deleted...)

> Anyway, so I hope this helps people understand buffer gap.  I can't
> imagine writing an editor any other way, even though it does have it's
> disadvantages.

I was also convinced that the buffer gap was the best way to write an
editor until I read "RED, a better C screen editor" by Edward K. Ream 
in Dr. Dobbs Journal, Number 81, July 1983.

The buffer routine describe therein uses virtual memory techniques and
caching to speed up buffer access.  Basically the edited file is read
into a sequence of blocks (~1-2k in size).  Each block has a pointer to
the disk address of both the previous and next block of the file (ie, doubly
linked), a dirty bit, and a count of how many characters are stored in
the block, and some number of characters from the file.

When inserting new text into a file, it is put in the current block if
there remains room. If it will not fit, a new block is allocated, linked
into the structure, and filled with the new data.  Similiarly, when
deleting, data blocks are merged or freed as needed and the empty disk
blocks are added to the free list.  Blocks which are modified have their
dirty bit set.

When browsing sequentially through a file, new blocks are read in as needed
and stored in memory.  The block slots in memory are cached in a Least
Recently Used (LRU) fashion.  This means that scrolling up and down in a file
over a small range will result in zero disk accesses because all the file
info is held in the block cache.

When it is necessary to flush a block to have space to read in a new
block, the least recently used block is replaced by the newly requested
block.  It is at this point that there can be a big performance
improvement over the buffer gap method.  In the buffer gap approach,
data pulled from the top of the gap must always be copied to the bottom
of the gap.  This involves alot of slow copying if the buffer gap file
is on disk.  In the RED system, only modified pages must actually be
written back to disk.  If the page is not modified (ie, the dirty bit is
not set on the page), then it is merely discarded from the cache when
new space is needed.  Simple sequential browsing through the file is
therefore twice as fast as the buffer gap method (no disc writes
required).  If you do alot of scanning of your file (regexp searches,
etc), the buffer gap approach will be twice as slow as the RED method.

If initialization of the later tempfile blocks is deferred until they are
actually needed, then the start up time for the editor is negligable.
For example, one could initialize only the first couple of buffer
blocks, start the editor with the cursor on line 1, and only read in later
blocks when the user scrolled up to that point in the file.

You may argue (correctly) that if the buffer gap algorithm is
implemented on a virtual memory machine that all the VM stuff is handled
by the operating system.  The price you pay is that you have a large
core image to be swapped around  (have you ever heard anyone complain about
the size of EMACS?).  Another gotcha is that every sweep through the file
in the buffer gap method modifies every single page of buffer memory. If the
file is bigger than memory, and therefore must be paged, the gap method
therefore forces the operating system to write all the pages. This
makes very poor use of the disk swap system.  RED on the other hand
only modifies pages when necessary and can usually avoid having to write
out stale pages.

The RED method allows a configurable and continuous trade-off of core
size vs cache miss ratio.  If desired, a very small but usable editor
can easily be configured.

Finally, the buffer gap editor is not portable to non-VM systems without
putting restrictions on the size of file that can be edited.  In
particular, files must fit in the available RAM instead of being limited
to available disk size. 

Cheers,
Rick Walker

--
Rick Walker
!hplabs!walker