Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!cs.utexas.edu!uunet!iconsys!tom From: tom@iconsys.UUCP (Tom Kimpton) Newsgroups: comp.sys.mac.programmer Subject: Re: Serious programming question Message-ID: <413@iconsys.UUCP> Date: 31 Oct 89 00:31:09 GMT References: <89295.142753CXT105@PSUVM.BITNET> Reply-To: tom@iconsys.UUCP (Tom Kimpton) Organization: ICON International, Inc., Orem, UT Lines: 49 In article pete@titan.rice.edu (Pete Keleher) writes: > >> What sort of data structures are generally used for representing text in an >> editor or word-processor? Obviously TextEdit is an alternative, but it's slo >> (as far as I've heard) and imposes size restrictions. > >Good question. I have looked at source for two editors and have written one >of my own. The best data structure that I have heard of would be "chunks". A >chunk would be a chunk of text, maybe 256 bytes, typically half full. The >chunks are then linked together in a linked list. Insertion/deletion usually >involves moving only some of the bytes in the current chunk. Sometimes, of >course, you must split chunks or combine them. There is usually no array of >line pointers, except maybe for the lines currently being displayed. Two >important data structures could be declared something like: > >typedef struct Chunk { > struct Chunk *next, *prev; > short len; > char text[256]; >} Chunk; > I did somethings very similar, but used smaller chunks. Because it was meant to be a program editor, I guestimated that the average line would be fairly short, and used 64 byte chunks, with sideways links. Joining lines involved no data movement, because the display code used the length of each chunk, and kept displaying while there were right links. Insertion and deletion involved no more than a single chunk's data (right & left shifting characters), empty chunks being deleted and new chunks being inserted when a chunk filled up. I didn't use C strings (null terminated) because I wanted to be able to edit binary files. Also, with sideways links, there is no problem with arbitrary length lines (vi's "Line too long" problem-- arggh!). Something I didn't do at the time, but would if doing it again, was to have an indent field, for auto-indentation, you might be able to cut down the size of the chunks. This was for a mono-font,size etc. editor. If you wanted to allow the use of different fonts, sizes, styles, etc, you might consider using high-bit set (non-ASCII) characters to indicate one of 128 "style" records (more if you have your code use one or more following bytes). Lots of fun. Editors are fun (well I guess most any programming project can be fun :-). Good luck with your program! -- Tom Kimpton UUCP: {uunet,caeco,nrc-ut}!iconsys!tom Software Engineer INTERNET: tom@iconsys.uu.net Icon International, Inc. BITNET: icon%byuadam.bitnet (multi-user acct) Orem, Utah 84058 PHONE: (801) 225-6888