Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!decwrl!pyramid!pesnta!peora!jer From: jer@peora.UUCP (J. Eric Roskos) Newsgroups: net.mail Subject: Re: Pathalias/uumail: some algorithms and questions Message-ID: <1967@peora.UUCP> Date: Mon, 10-Feb-86 19:51:58 EST Article-I.D.: peora.1967 Posted: Mon Feb 10 19:51:58 1986 Date-Received: Wed, 12-Feb-86 07:43:01 EST References: <122@delftcc.UUCP> <1954@peora.UUCP> <1958@peora.UUCP> Organization: Concurrent Computer Corporation, Orlando, Fl Lines: 119 This is the third (and possibly last) posting in a series explaining "opath"'s routing scheme. The previous two articles are <1954@peora.UUCP> and <1958@peora.UUCP>. I have written these articles with little provocation (and probably with few readers, and even fewer who agree) because for the past year I have been persistent in expounding on the way I think the UUCP mail should work; I am not one to hold to a position in the face of widespread disagreement unless I have thought through my position carefully, and thus wanted to explain the rationale. The final major tenet behind opath, which actually ties the previous two postings together, involves an abstract model for how the mail is delivered. A user at some originating site o writes a message, which is a string of characters S. He intends to send it to some destination site d, where it will be read. Hopefully, the format for messages will agree, possibly after some trivial transformation (e.g., converting LFs to CR/LFs) is made on the message, between the two sites. The extent to which they *have* to agree depends on the complexity of the programs originating and receiving them; this is a strong argument for simplicity of the programs (call them "mailers") at each end, but there are also a lot of beneficial things that such mailers can do (for example, automatically generating a reply) if a certain level of complexity is permitted. Presently there are (at least) two major formats for messages. One, which is an extension of the original Unix mailers, treats a message as simply a string of text beginning with the characters "From ", with messages separated by a blank line (so that any message in a file other than the first will actually begin with the string "\nFrom ". It is because of this standard that many Unix mailers insert the character ">" at the start of any paragraph beginning with the word "From".). This is an old standard that is not very compatible with mailers elsewhere. The second standard is RFC822. RFC822 is a standard for the format of mail messages, although it is usually discussed in net.mail in the context only of mail addresses. In any case, if the originating and destination mailers agree on a message format, that should be sufficient; all that is necessary is to get the message there. To do this requires what we usually call, in here, a "transport mechanism". Ideally, the transport mechanism should be entirely distinct from the mailers. It should be equally capable of sending arbitrary files between sites (e.g., binary object files) as mail messages. It shouldn't know and shouldn't be required to know either that it is transmitting a mail message, or what the format of the message is. In the domain of well-defined networks, this is accomplished by defining a series of "layers"; messages consist of a block of data, whose meaning is unimport- ant to the software at a given layer, encapsulated in an "envelope" (whose meaning *is* important to the software at that layer) describing how to deliver the message, along with information for validating that nothing has been lost out of the message (e.g., a checksum, CRC, etc.). At the next higher layer, this envelope is itself treated as data, along with all the rest of the data for the message, and another envelope is put around that. The software at this next-higher layer doesn't even know where the envelope for the layer below it ends, and the message begins. Since only the envelope is interpreted, the meaning of the data is unimportant, and no meaning is even defined for it. If the information in the envelope on routing is considered a language, then it is not necessary that the languages at two different layers be in any way compatible, as long as the integrity of the message/envelope distinction is not violated. This idea is fundamental to my arguments in favor of a distinct routing language for UUCP. It is in fact the case that in System V UUCP, the transport mechanism can deliver arbitrary data files without awareness of their contents, across many "hops". In prior UUCPs, the transport mechanism could only deliver the message across one "hop", i.e., to a neighboring site, after which a program (rmail) had to be run to decide where to send the message next. This is where the trouble started, since it provided the potential for circumventing the distinction between the message and its envelope. But, in fact, this was not done in standard Unix. A routing language as described previously was used; each rmail was given a string in the language, it took off the part, and delivered the message, along with an envelope consisting of the part, to the site named by . It also prepended a "routing stamp" to the front of the message it was delivering; although the routing string was in a separate file from the message and routing stamp, the routing string and routing stamp can be considered the envelope, as distinct from the message body. In this way, the message can be delivered without tampering with the message body; and, as discussed in the previous posting on the routing language, the message can even be moved across transport mechanisms (e.g., between the ARPAnet and UUCP network) without problems, as long as the receiving transport mechanism accepts a string in the form of as an instruction on how to deliver the message. The problem, and source of much debate and confusion, occurs when the envelope/message distinction is not maintained, however. This is especially easy to do when Sendmail is used to process the messages, since Sendmail provides nothing to prevent the combining of the two other than careful discipline. Fortunately, the "interpret the routing string in the context in which it was delivered" method provided by Gene Spafford does preserve that distinction. In reality, of course, mailers do make changes to the message. The main change they make is to add lines telling how the message was delivered ("Received:" lines). Unfortunately, some also make other changes; I have argued in the past that this is a result of confusing the routing language with the language used to define the standard format for the message; i.e., making the assumption (which I have claimed is incorrect) that because the envelope uses one language, the message must also comply with it (or vice versa). It is my contention, based on the model given above, that no such compliance is needed; and furthermore, that since the original Unix mailers had a very trivial definition of the structure of the message itself, that the message can be made to comply with RFC822, and thus with other RFC822-compliant networks, without the great deal of confusion that now exists over how to do so, and without making the message (while in the domain of UUCP) non-compliant with RFC822. -- UUCP: Ofc: jer@peora.UUCP Home: jer@jerpc.CCUR.UUCP CCUR DNS: peora, pesnta US Mail: MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company) 2486 Sand Lake Road, Orlando, FL 32809-7642 xxxxx4xxx