Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!spool.mu.edu!uwm.edu!ogicse!pdxgate!eecs!kirkenda From: kirkenda@eecs.cs.pdx.edu (Steve Kirkendall) Newsgroups: comp.editors Subject: Re: REPOST: editor.101 Summary: Elvis does it this way... Message-ID: <2857@pdxgate.UUCP> Date: 13 Jun 91 04:28:27 GMT References: <1991Jun4.155657.4357@mks.com> <1991Jun11.195732.25500@wpi.WPI.EDU> <1991Jun12.143930.14186@mks.com> <1991Jun12.202509.14825@wpi.WPI.EDU> Sender: news@pdxgate.UUCP Reply-To: kirkenda@eecs.UUCP (Steve Kirkendall) Organization: Portland State University, Portland, OR Lines: 73 In article <1991Jun12.202509.14825@wpi.WPI.EDU> rcarter@wpi.WPI.EDU (Randolph Carter (nee. Joseph H. Allen)) writes: >The ghost of ant@mks.com (Anthony Howe) writes: >>rcarter@wpi.WPI.EDU (Randolph Carter (nee. Joseph H. Allen)) writes: >>>a contiguous buffer up into pieces and add or delete characters from the edges >>>of the pieces. Each piece is stored in a malloc-like structure in a virtual >>>memory file. You have to maintain a list of all the pieces and search through >>>this list when you move out of a piece. You also glue pieces together when >>>possible. This is a neat technique, but it's complicated. > >One problem with this is that you have to seek to the correct piece. If, as >was done in the pointing editor, you keep a linked list of headers for the >pieces, then editing will slow down as more pieces are made. Another is that >the virtual memory system needs an malloc. > >With the scratch file technique, do you combine the files together when you >reach some maximum? Or am I misunderstanding it... This sounds a lot like the technique I used in Elvis. Of course, my perceptions are biased so I may be seeing similarities that aren't there, but... Elvis uses a temporary file for storing the edit buffer. This file is divided into blocks of 1k bytes apiece. Each block contains one or more whole lines; lines are not allowed to straddle a block boundary. Each line is terminated with newline character. Empty space (at the end of each block) is filled with NUL characters. The physical order of the blocks is irrelevent; Elvis maintains a table that maps logical block numbers into physical block numbers. When new text is inserted into the buffer, there is a chance that one of the old blocks will need to be split in two; if this happens, a new block will be allocated at the end of the scratch file, and the map table will be modified to logically map that block into the middle of the edit buffer. >You have to have some other structure that lets you quickly seek to the proper >entry. For example, have blocks with pointer/size pairs in them in the >virtual memory system which refer to blocks containing text (you can have >multiple levels of these). Then you can standard tree techniques (btree >techniques?) to adjust and balance the trees as you insert/delete. Elvis uses its map table, plus another table that stores the line number of the last line in each block. To locate a given line, you would scan the line-number table to determine which logical block contained the line, then use the block map table to convert the logical block number to a physical block number, read the block, and then scan through the block for the desired line. It's a lot faster than it sounds! >To make it practicle (you don't want to have to seek through all this for >every byte you access (except in the gap editor)), have a tiny cache >containing the last few seeks. The two tables are stored in RAM, so they can be scanned very quickly. (The block map table is periodically written to the disk, though, to allow for crash recovery.) I have several levels of caching, which allows elvis to skip most of the above steps for sequential accesses. The tables are very small. The block map table is an array of shorts, and its total size is limited to 1 block -- 1K bytes, normally, which translates into 512 entries. This limits elvis to editing files that are no larger than about half a megabyte. With a 2K byte block, the limit rises to two megabytes (twice as many entries, and each block contains twice as much text). By the way, elvis' most interesting source code is located in "modify.c" (the functions for inserting, deleting, and replacing text), with other fun stuff in "blk.c" (the block cache, and 'undo' support) and "tmp.c" (loading and saving text). Now, does anybody want to hear *why* I did it this way? ------------------------------------------------------------------------------- Steve Kirkendall kirkenda@cs.pdx.edu Grad student at Portland State U.