Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!bloom-beacon!CSE.OGI.EDU!schaefer From: schaefer@CSE.OGI.EDU (Barton E. Schaefer) Newsgroups: comp.mail.mush Subject: Re: compressed folders revisited Message-ID: <9002090129.AA23568@cse.ogi.edu> Date: 9 Feb 90 01:29:38 GMT Sender: daemon@athena.mit.edu (Mr Background) Organization: The Internet Lines: 114 In <1990Feb8.184406.2683@ux1.cso.uiuc.edu> Kurt Hirchert wrote: } Subject: compressed folders revisited (long) } } From the discussion, it would appear that mush performs the following } low level operations on a mail file: } 1. reading the entire file sequentially Correct. It also copies the entire file to a temporary location, which can be freely modified without altering the original file. That's why, for example, if you "merge" folders or run the "edit" command on a message and then exit without updating, the original file is unchanged. } 2. reading selected sequences of bytes from the file randomly -- the } size and location of these sequences can apparently be identified } during the sequential pass Yes. Location but not size at the moment, though I suppose it could be worked out. } 3. appending data (of known size?) to the end of the file True, on "merge" (including "undigest -m"). The size is not necessarily known. On update, it may also add information in the middle of the file (the "Status:" header that so annoys Berkeley-mail-haters). } 4. creating a new file Only on "save" and friends, unless you count the temp file created when loading the folder. } 5. renaming a new file to replace an old file Mush *never* does this. It either overwrites and truncates (on systems that have ftruncate()) or truncates and rewrites (everywhere else) the original file. } 6. deleting an old file This happens only for the temp files or for folders that are found at update time to be empty (all messages deleted). } What I am suggesting is that it might be possible for mush to recognize } that some classes of mail files require special processing and to fork } an new process to do that special processing. This might be fine when it is possible to fork a second process. Mush does run on non-UNIX systems, though .... } The process could accept } "commands" and data through stdin and produce status information and } data on stdout. The other problem with this is getting everything connected appropriately to avoid deadlock. See the recent discussion on comp.unix.questions. Not to imply that it's impossible. When mush is fixed so it doesn't need backward seeks to load a folder, a trivial way to implement a subset of this idea would be to have mush check the "folder" it is loading for execute permission. If execute is set, mush runs the "folder" as a command, perhaps giving it "-r" as an argument, and snarfs into its temp file from the output of that command. Similarly, on any "save" or "update" whose target turns out to be an executable file, it could run it with "-a" or "-w" as appropriate, and write to the standard input of the command. Then it's up to the "folder" to correctly modify "itself". But the idea of having the process hang around to be both read from and written to, as a replacement for the temp file, doesn't strike me as the best solution. } Assumining } that both the file containing the process and the conditions under } which it is invoked are made user specifiable, mush can be extended to } handle new mail storage formats to write such a process, without Dan } and Bart's intervention. "User specifiable" might be a problem, except with a scheme like testing the executable bit in the file permissions. It might, however, be possible to make it compile-time specifiable. In fact, it might be possible to link the file-manipulation functions directly into mush, as is done with some of the MMDF functions now, and not need separate processes at all. Another difficulty with one process for both input and output is that we want future mushes to be able to handle several folders at once, so you can move messages around from one to the other or create new folders of messages taken from one or more others. Furthermore, all of this should be able to happen "virtually", that is, without actually changing the underlying "real" folders until updates are done. If you are forced to fork one process, keep it running, and connect two file descriptors to it, for every folder you access, you're going to be in real trouble. I don't know the LZW encoding algorithm well enough to comment on the example given. It sounds reasonable, except for the general arguments I've been mentioning; how would those change things? } Example 3 - Directory as Mail File In connection with having multiple folders open, there's going to be support for locating different messages in different files, which pretty much takes care of the directory-as-folder problem -- you just treat it as a whole bunch of one-message folders, all open at the same time. } OK folks - how does this approach sound to you? Sounds promising but in need of more work. For example, another planned addition to mush is "index" files that store the seek offsets currently determined during the read pass; with this info stored ahead of time, the requirements for the read "process" are simplified. -- Bart Schaefer "February. The hangnail on the big toe of the year." -- Duffy schaefer@cse.ogi.edu (used to be cse.ogc.edu)