Path: utzoo!utgpu!attcan!uunet!ginosko!cs.utexas.edu!rice!brazos.rice.edu!pete From: pete@titan.rice.edu (Pete Keleher) Newsgroups: comp.sys.mac.programmer Subject: Re: Serious programming question Message-ID: Date: 23 Oct 89 01:01:44 GMT References: <89295.142753CXT105@PSUVM.BITNET> Sender: root@rice.edu Organization: Whatsamatta U Lines: 55 In-reply-to: CXT105@PSUVM.BITNET's message of 22 Oct 89 18:27:53 GMT > What sort of data structures are generally used for representing text in an > editor or word-processor? Obviously TextEdit is an alternative, but it's slo > (as far as I've heard) and imposes size restrictions. Not only does TextEdit impose size restrictions, but it's updating leaves a lot to be desired. One of my goals was to get the editor to look and feel as solid as Think C's editor, but not be bare-bones. Good question. I have looked at source for two editors and have written one of my own. The best data structure that I have heard of would be "chunks". A chunk would be a chunk of text, maybe 256 bytes, typically half full. The chunks are then linked together in a linked list. Insertion/deletion usually involves moving only some of the bytes in the current chunk. Sometimes, of course, you must split chunks or combine them. There is usually no array of line pointers, except maybe for the lines currently being displayed. Two important data structures could be declared something like: typedef struct Chunk { struct Chunk *next, *prev; short len; char text[256]; } Chunk; typedef struct { Chunk *chunk; char *cp; } mark; In my editor, I use one large chunk for the file image that is read from disk, along with an array of line pointers into the file image. When a line is modified, I allocate a chunk and go from there. When lines are inserted/deleted I need to move all of the line pointers after the line in question, but in practice this is cheap. The overhead of having data in the file image in memory but no longer used is also cheap. This is definitely NOT the way to do it if you're designing from scratch, but I made some poor decisions in the beginning (basically I put off the decision, and by the time I could see where to go, I couldn't do it without re-writing everything). The up side is that: 1) In the case where relatively little is modified, (say less than 30%), memory costs are low. 2) It is very fast. In particular, I can access any line as fast as any other. as opposed to the straight chunk approach where you have to search through all of the lines between the start of the file and your current position just to find out what line you are on. -- =========================================================================== Pete Keleher pete@titan.rice.edu Rice University knows nuttin about what I say, or what I do ... ===========================================================================