Xref: utzoo comp.editors:1216 comp.emacs:7514 Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!cs.utexas.edu!sun-barr!newstop!grapevine!sun!flam!jpayne From: jpayne%flam@Sun.COM (Jonathan Payne) Newsgroups: comp.editors,comp.emacs Subject: What is buffer gap you ask? Message-ID: <129846@sun.Eng.Sun.COM> Date: 3 Jan 90 19:41:09 GMT Sender: news@sun.Eng.Sun.COM Lines: 154 Since my posting I have had a lot of people ask what buffer gap is. So instead of answering each in turn I figured I would just post a message. I learned about buffer gap in the Emacs Cookbook that was written at MIT a million years ago. I think it was someone's master's thesis. I didn't see it implemented until Gosling's emacs. Buffer gap is just one way to store a bunch of characters you wish to edit. (Another way is having a linked list of lines.) Emacs uses the buffer gap mechanism, and here is what a buffer looks like in emacs. |<----- first half -----><----- gap -----><------ second half ------>| An emacs buffer is just one huge array of characters, with a gap in the middle. There are no characters in the gap. So at any given time the buffer is described as the characters on the left side of the gap, followed by the characters on the right side of the gap. Why is there a gap? Well if there isn't a gap and you want to insert one character at the beginning of the buffer, you would have to copy all the characters over to the right by one, and then stick in the character you were inserting. That's ridiculous, right? Imagine editing a 100kbyte file and inserting characters at the beginning. The editor would be spending the entire time copying characters over to the right one. Delete would be done by copying characters to the left, just as slow. With a buffer gap, editing operations are achieved by moving the gap to the place in the buffer where you want to make the change, and then either by shrinking the size of the gap for insertions, or increasing the size of the gap for deletions. In case it isn't obvious, this makes insertion and deletion incredible simple to implement. To insert a character C at position POS in a buffer, you would do something like this: /* Move the gap to position POS. That is, make the first half be POS characters long. */ Buffer_MoveGap(b, pos); b->data[b->first_half++] = c; b->gap_size -= 1; There, done. The gap is now one character smaller because we made the first half bigger while sticking in the character we wished to insert. Actually, at the beginning of the routine there should have been a check to make sure that there is room in the gap for more characters. When the gapsize is 0, it is necessary to realloc the entire buffer. Deletion is even easier. To delete N characters at POS, all you do is make the gap bigger! That is, by making the gap bigger, you take away from characters that used to be part of the buffer. Buffer_MoveGap(b, pos); b->gap_size += n; That is delete, folks. Moving the gap is pretty trivial. Just decide if you are moving it forward or backward, and use bcopy. bcopy is smart enough to handle overlapping regions. So, when emacs reads in a file it allocates enough memory for the entire file, plus maybe 1024 bytes for the buffer gap. Initially the buffer gap is at the end of the file, so when you insert at the beginning of the file right after reading it in, you will notice a longer delay than usual, because it first has to move the gap to the beginning of the file. The gap only has to be moved when you are doing edit operations. To examine the contents of a buffer, you can define a macro: Buffer_CharAt(b, pos) All this does it check to see whether pos is in the first half or the second half, and then index that one HUGE array of characters correctly. It's surprisingly fast, actually. Somebody mentioned that GNU search takes two strings to search in. That was Stallman's way of optimizing the hell out of the search. The two strings passed in represent the first half and the second half of the buffer. Now the search code does not have to use the Buffer_CharAt macro to examine the buffer because it is guarunteed to have two contiguous strings of valid data. That's a good idea - I might have to adopt that approach. Notice that with this approach to storing the file, you can edit binary files without thinking about it. There are no line length limits. Newline characters are just normal characters, which you can search for if you want. Also notice that to read a file, you do the following: fstat(fd, &stbuf); /* make sure the gap is at least as big as the file */ Buffer_EnsureGapSize(b, stbuf.st_size); read(fd, &b->data[b->first_size], stbuf.st_size); There, we just inserted a file into the buffer ... with ONE call to the read system call. Straight into the buffer, not into a temporary buffer. This is FAST. On my Sun 3/60 it takes about a second to read in a megabyte. Redisplay. One of the hard things with buffer gap is coming up with a good, fast redisplay algorithm. With a linked list approach, it's really easy to write one. It's important that internally the editor can munge around and make changes to the buffer, and then just make one call to redisplay when it's done. And the redisplay should be smart enough to use insert and delete line capabilities of the terminal/window system for scrolling and for when lines are inserted and deleted. The way any smart redisplay algorithm should work is in stages: 1) Layout the window. This means figure out which lines are going to go where; basically how the screen should look when the redisplay is done, without actually doing any output to the screen. 2) Compare this layout to the layout of the screen after the last redisplay. This boils down to comparing lines to see if the same lines are in the same place as before. When you scroll, all the lines will be off by how ever many lines you scrolled by. This is easy to detect, and when you do, you just use the terminals insert/delete line capability to make the redisplay faster. 3) Repaint the lines which are different from before. Lines are different either because they have been changed, or because they weren't even visible before. So how do you actually compare lines quickly? You can't do a string comparison, because the scrolling algorithms have to do lots of comparisons. With a linked list approach to storing a buffer, you can just compare the line pointers (a simple integer comparison!) in phase 2 (above). So it's nice and fast. But what do you do for buffer gap implementations? Well there are a couple of solutions, and I can tell you two of them. The first is Gosling's approach (which GNU still uses to a certain extent) which is to calculate a hash value for each line based on the contents of the line as it would appear on the screen, and then using those hash values to do phase 2 of the redisplay. This has two bad problems in my opinion. The first is that it takes a long time to calculate those hash values, and second, since it's based on the contents of the lines and not the physical position of those lines in the file, the redisplay can be soooooo smart that it does unintuitive things. For instance, if you have duplicate code (shame shame shame) and you type ^V to move forward a page, emacs will sometimes scroll BACKWARDS on you! It's very confusing, even if it is the fastest way to update the screen. Since it takes so long to calculate those hash values, Gosling and Stallman had to kludge up the redisplay algorithm to try and optimize the 90% case of just inserting characters. That's a shame and isn't necessary. I am not sure if I am allowed to describe the fast way to do this, but if you are interested, I will see whether my managers will complain. Anyway, so I hope this helps people understand buffer gap. I can't imagine writing an editor any other way, even though it does have it's disadvantages.