Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!decwrl!pyramid!pesnta!peora!jer
From: jer@peora.UUCP (J. Eric Roskos)
Newsgroups: net.mail
Subject: Re: Pathalias/uumail: some algorithms and questions
Message-ID: <1967@peora.UUCP>
Date: Mon, 10-Feb-86 19:51:58 EST
Article-I.D.: peora.1967
Posted: Mon Feb 10 19:51:58 1986
Date-Received: Wed, 12-Feb-86 07:43:01 EST
References: <122@delftcc.UUCP> <1954@peora.UUCP> <1958@peora.UUCP>
Organization: Concurrent Computer Corporation, Orlando, Fl
Lines: 119

This is the third (and possibly last) posting in a series explaining
"opath"'s routing scheme.  The previous two articles are <1954@peora.UUCP>
and <1958@peora.UUCP>.  I have written these articles with little
provocation (and probably with few readers, and even fewer who agree)
because for the past year I have been persistent in expounding on the
way I think the UUCP mail should work; I am not one to hold to a position
in the face of widespread disagreement unless I have thought through my
position carefully, and thus wanted to explain the rationale.


The final major tenet behind opath, which actually ties the previous two
postings together, involves an abstract model for how the mail is
delivered.

A user at some originating site o writes a message, which is a string of
characters S.  He intends to send it to some destination site d, where it
will be read.  Hopefully, the format for messages will agree, possibly
after some trivial transformation (e.g., converting LFs to CR/LFs) is made
on the message, between the two sites.  The extent to which they *have* to
agree depends on the complexity of the programs originating and receiving
them; this is a strong argument for simplicity of the programs (call them
"mailers") at each end, but there are also a lot of beneficial things that
such mailers can do (for example, automatically generating a reply) if
a certain level of complexity is permitted.

Presently there are (at least) two major formats for messages.  One, which
is an extension of the original Unix mailers, treats a message as simply
a string of text beginning with the characters "From ", with messages
separated by a blank line (so that any message in a file other than the
first will actually begin with the string "\nFrom ".  It is because of this
standard that many Unix mailers insert the character ">" at the start of
any paragraph beginning with the word "From".).  This is an old standard
that is not very compatible with mailers elsewhere.

The second standard is RFC822.  RFC822 is a standard for the format of
mail messages, although it is usually discussed in net.mail in the context
only of mail addresses.

In any case, if the originating and destination mailers agree on a message
format, that should be sufficient; all that is necessary is to get the
message there.  To do this requires what we usually call, in here, a
"transport mechanism".

Ideally, the transport mechanism should be entirely distinct from the
mailers.  It should be equally capable of sending arbitrary files between
sites (e.g., binary object files) as mail messages.  It shouldn't know and
shouldn't be required to know either that it is transmitting a mail
message, or what the format of the message is.  In the domain of
well-defined networks, this is accomplished by defining a series of
"layers"; messages consist of a block of data, whose meaning is unimport-
ant to the software at a given layer, encapsulated in an "envelope" (whose
meaning *is* important to the software at that layer) describing how to
deliver the message, along with information for validating that nothing
has been lost out of the message (e.g., a checksum, CRC, etc.).  At the
next higher layer, this envelope is itself treated as data, along with all
the rest of the data for the message, and another envelope is put around
that.  The software at this next-higher layer doesn't even know where
the envelope for the layer below it ends, and the message begins.

Since only the envelope is interpreted, the meaning of the data is
unimportant, and no meaning is even defined for it.  If the information
in the envelope on routing is considered a language, then it is not
necessary that the languages at two different layers be in any way
compatible, as long as the integrity of the message/envelope distinction
is not violated.

This idea is fundamental to my arguments in favor of a distinct routing
language for UUCP.  It is in fact the case that in System V UUCP, the
transport mechanism can deliver arbitrary data files without awareness
of their contents, across many "hops".  In prior UUCPs, the transport
mechanism could only deliver the message across one "hop", i.e., to a
neighboring site, after which a program (rmail) had to be run to
decide where to send the message next.  This is where the trouble
started, since it provided the potential for circumventing the
distinction between the message and its envelope.

But, in fact, this was not done in standard Unix.  A routing language as
described previously was used; each rmail was given a string in the
language, it took off the <nextsite> part, and delivered the message,
along with an envelope consisting of the <uninterpreted> part, to
the site named by <nextsite>.  It also prepended a "routing stamp" to
the front of the message it was delivering; although the routing
string was in a separate file from the message and routing stamp, the
routing string and routing stamp can be considered the envelope,
as distinct from the message body.  In this way, the message can
be delivered without tampering with the message body; and, as discussed
in the previous posting on the routing language, the message can even
be moved across transport mechanisms (e.g., between the ARPAnet and UUCP
network) without problems, as long as the receiving transport mechanism
accepts a string in the form of <interpreted> as an instruction on how
to deliver the message.

The problem, and source of much debate and confusion, occurs when the
envelope/message distinction is not maintained, however.  This is
especially easy to do when Sendmail is used to process the messages,
since Sendmail provides nothing to prevent the combining of the two
other than careful discipline.  Fortunately, the "interpret the routing
string in the context in which it was delivered" method provided by
Gene Spafford does preserve that distinction.

In reality, of course, mailers do make changes to the message.  The
main change they make is to add lines telling how the message was
delivered ("Received:" lines).  Unfortunately, some also make other
changes; I have argued in the past that this is a result of confusing
the routing language with the language used to define the standard format
for the message; i.e., making the assumption (which I have claimed is
incorrect) that because the envelope uses one language, the message must
also comply with it (or vice versa).  It is my contention, based on the
model given above, that no such compliance is needed; and furthermore,
that since the original Unix mailers had a very trivial definition of
the structure of the message itself, that the message can be made to comply
with RFC822, and thus with other RFC822-compliant networks, without the
great deal of confusion that now exists over how to do so, and without
making the message (while in the domain of UUCP) non-compliant with
RFC822.
-- 
UUCP: Ofc:  jer@peora.UUCP  Home: jer@jerpc.CCUR.UUCP  CCUR DNS: peora, pesnta
  US Mail:  MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company)
	    2486 Sand Lake Road, Orlando, FL 32809-7642     xxxxx4xxx