Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!bloom-beacon!CSE.OGI.EDU!schaefer
From: schaefer@CSE.OGI.EDU (Barton E. Schaefer)
Newsgroups: comp.mail.mush
Subject: Re: compressed folders revisited
Message-ID: <9002090129.AA23568@cse.ogi.edu>
Date: 9 Feb 90 01:29:38 GMT
Sender: daemon@athena.mit.edu (Mr Background)
Organization: The Internet
Lines: 114

In <1990Feb8.184406.2683@ux1.cso.uiuc.edu> Kurt Hirchert wrote:
} Subject: compressed folders revisited (long)
}
} From the discussion, it would appear that mush performs the following
} low level operations on a mail file:
} 1. reading the entire file sequentially

Correct.  It also copies the entire file to a temporary location, which
can be freely modified without altering the original file.  That's why,
for example, if you "merge" folders or run the "edit" command on a
message and then exit without updating, the original file is unchanged.

} 2. reading selected sequences of bytes from the file randomly -- the
}    size and location of these sequences can apparently be identified
}    during the sequential pass

Yes.  Location but not size at the moment, though I suppose it could
be worked out.

} 3. appending data (of known size?) to the end of the file

True, on "merge" (including "undigest -m").  The size is not necessarily
known.  On update, it may also add information in the middle of the file
(the "Status:" header that so annoys Berkeley-mail-haters).

} 4. creating a new file

Only on "save" and friends, unless you count the temp file created when
loading the folder.

} 5. renaming a new file to replace an old file

Mush *never* does this.  It either overwrites and truncates (on systems
that have ftruncate()) or truncates and rewrites (everywhere else) the
original file.

} 6. deleting an old file

This happens only for the temp files or for folders that are found at
update time to be empty (all messages deleted).

} What I am suggesting is that it might be possible for mush to recognize
} that some classes of mail files require special processing and to fork
} an new process to do that special processing.

This might be fine when it is possible to fork a second process.  Mush
does run on non-UNIX systems, though ....

} The process could accept
} "commands" and data through stdin and produce status information and
} data on stdout.

The other problem with this is getting everything connected appropriately
to avoid deadlock.  See the recent discussion on comp.unix.questions.
Not to imply that it's impossible.

When mush is fixed so it doesn't need backward seeks to load a folder, a
trivial way to implement a subset of this idea would be to have mush
check the "folder" it is loading for execute permission.  If execute is
set, mush runs the "folder" as a command, perhaps giving it "-r" as an
argument, and snarfs into its temp file from the output of that command.
Similarly, on any "save" or "update" whose target turns out to be an
executable file, it could run it with "-a" or "-w" as appropriate, and
write to the standard input of the command.  Then it's up to the "folder"
to correctly modify "itself".  But the idea of having the process hang
around to be both read from and written to, as a replacement for the temp
file, doesn't strike me as the best solution.

} Assumining
} that both the file containing the process and the conditions under
} which it is invoked are made user specifiable, mush can be extended to
} handle new mail storage formats to write such a process, without Dan
} and Bart's intervention.

"User specifiable" might be a problem, except with a scheme like testing
the executable bit in the file permissions.  It might, however, be
possible to make it compile-time specifiable.  In fact, it might be
possible to link the file-manipulation functions directly into mush, as
is done with some of the MMDF functions now, and not need separate
processes at all.

Another difficulty with one process for both input and output is that we
want future mushes to be able to handle several folders at once, so you
can move messages around from one to the other or create new folders of
messages taken from one or more others.  Furthermore, all of this should
be able to happen "virtually", that is, without actually changing the
underlying "real" folders until updates are done.  If you are forced to
fork one process, keep it running, and connect two file descriptors to
it, for every folder you access, you're going to be in real trouble.

I don't know the LZW encoding algorithm well enough to comment on the
example given.  It sounds reasonable, except for the general arguments
I've been mentioning; how would those change things?

} Example 3 - Directory as Mail File

In connection with having multiple folders open, there's going to be
support for locating different messages in different files, which pretty
much takes care of the directory-as-folder problem -- you just treat it
as a whole bunch of one-message folders, all open at the same time.

} OK folks - how does this approach sound to you?

Sounds promising but in need of more work.  For example, another planned
addition to mush is "index" files that store the seek offsets currently
determined during the read pass; with this info stored ahead of time, the
requirements for the read "process" are simplified.


-- 
Bart Schaefer          "February.  The hangnail on the big toe of the year."
                                                                    -- Duffy

schaefer@cse.ogi.edu (used to be cse.ogc.edu)