Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!bfmny0!tneff From: tneff@bfmny0.UU.NET (Tom Neff) Newsgroups: comp.mail.mush Subject: Huge folders Message-ID: <15147@bfmny0.UU.NET> Date: 2 Feb 90 14:42:58 GMT Reply-To: tneff@bfmny0.UU.NET (Tom Neff) Lines: 47 I too have noticed that folders tend to get huge. When they do, Mush's performance gets excruciatingly slow. Loading them means scanning the entire file searching for message starts. This can take forever! And yet breaking the folders up into smaller ones is an unsatisfying solution, because you lose the ability to manipulate the entire collection of messages with "pick," "sort" etc. This is what we use Mush for to begin with - hate to give it up. So another idea occurs to me - how about INDEXING huge folders? Storing a list of start-of-message pointers in a separate index file (in the same directory as the folder) would let you access a huge folder in seconds. The header fields Mush displays for the current screenful of messages could be grabbed in a few seek-and-reads. As you need other headers, you go get them. (Mush could even keep loading headers in the background after displaying the current set and entering the shell in the foreground.) How it might work: The user decides that the currently opened folder ("+mysave") is huge and should be indexed. He issues the Mush command: index This creates "+X.mysave" containing start-of-message file pointers for the folder. Mush remembers the folder is indexed and will update the index file whenever the folder itself is updated. In a later Mush session the user selects folder "+mysave" and Mush notices that "+X.mysave" also exists. If it is newer than "+mysave" then the index is loaded and its pointers used to achieve a fast "scan" of the folder; only the needed messages are actually read from the folder file. If the index exists but is older than the folder file Mush gets smart. If the old indices LOOK like they point to messages, and the folder is just bigger, then Mush fast-scans the "old" portion and brute force reads the "new" before display. This is the normal case when a mail delivery agent appends new messages to a folder. But if the indices look WRONG now (indicating that somebody edited or otherwise touched the folder with some other program since the last Mush session), Mush warns the user "Index obsolete - rebuild? [y]" and prompts. (I haven't thought in any depth about what Mush does if the user answers "no", but clearly Mush doesn't use the index.) A final optimization for huge folders would be to update the original file IN PLACE if the user's changes don't require moving any text around, e.g., deleting new messages while leaving old ones untouched. I realize not everyone's OS permits this, but it would make a nice compile time switch.