Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!cbosgd!ulysses!bellcore!decvax!decwrl!pyramid!pesnta!amd!amdcad!lll-crg!seismo!brl-adm!ron From: ron@brl-adm.UUCP Newsgroups: mod.sources.doc Subject: rfc822 (2 of 5) Message-ID: <723@brl-adm.ARPA> Date: Tue, 20-May-86 00:00:08 EDT Article-I.D.: brl-adm.723 Posted: Tue May 20 00:00:08 1986 Date-Received: Sat, 24-May-86 22:29:13 EDT Distribution: net Organization: Ballistic Research Lab Lines: 580 Approved: RON@BRL.ARPA Standard for ARPA Internet Text Messages is analyzed into the following lexical symbols and types: :sysmail quoted string @ special Some-Group atom . special Some-Org atom , special Muhammed atom . special (I am the greatest) comment Ali atom @ atom (the) comment Vegas atom . special WBA atom The canonical representations for the data in these addresses are the following strings: ":sysmail"@Some-Group.Some-Org and Muhammed.Ali@Vegas.WBA Note: For purposes of display, and when passing such struc- tured information to other systems, such as mail proto- col services, there must be NO linear-white-space between s that are separated by period (".") or at-sign ("@") and exactly one SPACE between all other s. Also, headers should be in a folded form. August 13, 1982 - 8 - RFC #822 Standard for ARPA Internet Text Messages 3.2. HEADER FIELD DEFINITIONS These rules show a field meta-syntax, without regard for the particular type or internal syntax. Their purpose is to permit detection of fields; also, they present to higher-level parsers an image of each field as fitting on one line. field = field-name ":" [ field-body ] CRLF field-name = 1* field-body = field-body-contents [CRLF LWSP-char field-body] field-body-contents = August 13, 1982 - 9 - RFC #822 Standard for ARPA Internet Text Messages 3.3. LEXICAL TOKENS The following rules are used to define an underlying lexical analyzer, which feeds tokens to higher level parsers. See the ANSI references, in the Bibliography. ; ( Octal, Decimal.) CHAR = ; ( 0-177, 0.-127.) ALPHA = ; (101-132, 65.- 90.) ; (141-172, 97.-122.) DIGIT = ; ( 60- 71, 48.- 57.) CTL = ; ( 177, 127.) CR = ; ( 15, 13.) LF = ; ( 12, 10.) SPACE = ; ( 40, 32.) HTAB = ; ( 11, 9.) <"> = ; ( 42, 34.) CRLF = CR LF LWSP-char = SPACE / HTAB ; semantics = SPACE linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. delimiters = specials / linear-white-space / comment text = atoms, specials, CR & bare LF, but NOT ; comments and including CRLF> ; quoted-strings are ; NOT recognized. atom = 1* quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. qtext = , ; => may be folded "\" & CR, and including linear-white-space> domain-literal = "[" *(dtext / quoted-pair) "]" August 13, 1982 - 10 - RFC #822 Standard for ARPA Internet Text Messages dtext = may be folded "]", "\" & CR, & including linear-white-space> comment = "(" *(ctext / quoted-pair / comment) ")" ctext = may be folded ")", "\" & CR, & including linear-white-space> quoted-pair = "\" CHAR ; may quote any char phrase = 1*word ; Sequence of words word = atom / quoted-string 3.4. CLARIFICATIONS 3.4.1. QUOTING Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these charac- ters as uninterpreted data, a quoting mechanism is provided. To quote a character, precede it with a backslash ("\"). This mechanism is not fully general. Characters may be quoted only within a subset of the lexical constructs. In particu- lar, quoting is limited to use within: - quoted-string - domain-literal - comment Within these constructs, quoting is REQUIRED for CR and "\" and for the character(s) that delimit the token (e.g., "(" and ")" for a comment). However, quoting is PERMITTED for any character. Note: In particular, quoting is NOT permitted within atoms. For example when the local-part of an addr-spec must contain a special character, a quoted string must be used. Therefore, a specification such as: Full\ Name@Domain is not legal and must be specified as: "Full Name"@Domain August 13, 1982 - 11 - RFC #822 Standard for ARPA Internet Text Messages 3.4.2. WHITE SPACE Note: In structured field bodies, multiple linear space ASCII characters (namely HTABs and SPACEs) are treated as single spaces and may freely surround any symbol. In all header fields, the only place in which at least one LWSP-char is REQUIRED is at the beginning of continua- tion lines in a folded field. When passing text to processes that do not interpret text according to this standard (e.g., mail protocol servers), then NO linear-white-space characters should occur between a period (".") or at-sign ("@") and a . Exactly ONE SPACE should be used in place of arbitrary linear-white-space and comment sequences. Note: Within systems conforming to this standard, wherever a member of the list of delimiters is allowed, LWSP-chars may also occur before and/or after it. Writers of mail-sending (i.e., header-generating) programs should realize that there is no network-wide definition of the effect of ASCII HT (horizontal-tab) characters on the appear- ance of text at another network host; therefore, the use of tabs in message headers, though permitted, is discouraged. 3.4.3. COMMENTS A comment is a set of ASCII characters, which is enclosed in matching parentheses and which is not within a quoted-string The comment construct permits message originators to add text which will be useful for human readers, but which will be ignored by the formal semantics. Comments should be retained while the message is subject to interpretation according to this standard. However, comments must NOT be included in other cases, such as during protocol exchanges with mail servers. Comments nest, so that if an unquoted left parenthesis occurs in a comment string, there must also be a matching right parenthesis. When a comment acts as the delimiter between a sequence of two lexical symbols, such as two atoms, it is lex- ically equivalent with a single SPACE, for the purposes of regenerating the sequence, such as when passing the sequence onto a mail protocol server. Comments are detected as such only within field-bodies of structured fields. If a comment is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical August 13, 1982 - 12 - RFC #822 Standard for ARPA Internet Text Messages Analysis of Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Note that the official semantics therefore do not "see" any unquoted CRLFs that are in comments, although particular pars- ing programs may wish to note their presence. For these pro- grams, it would be reasonable to interpret a "CRLF LWSP-char" as being a CRLF that is part of the comment; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash followed by a CR followed by a LF) still must be followed by at least one LWSP-char. 3.4.4. DELIMITING AND QUOTING CHARACTERS The quote character (backslash) and characters that delimit syntactic units are not, generally, to be taken as data that are part of the delimited or quoted unit(s). In particular, the quotation-marks that define a quoted-string, the parentheses that define a comment and the backslash that quotes a following character are NOT part of the quoted- string, comment or quoted character. A quotation-mark that is to be part of a quoted-string, a parenthesis that is to be part of a comment and a backslash that is to be part of either must each be preceded by the quote-character backslash ("\"). Note that the syntax allows any character to be quoted within a quoted-string or comment; however only certain characters MUST be quoted to be included as data. These characters are the ones that are not part of the alternate text group (i.e., ctext or qtext). The one exception to this rule is that a single SPACE is assumed to exist between contiguous words in a phrase, and this interpretation is independent of the actual number of LWSP-chars that the creator places between the words. To include more than one SPACE, the creator must make the LWSP- chars be part of a quoted-string. Quotation marks that delimit a quoted string and backslashes that quote the following character should NOT accompany the quoted-string when the string is passed to processes that do not interpret data according to this specification (e.g., mail protocol servers). 3.4.5. QUOTED-STRINGS Where permitted (i.e., in words in structured fields) quoted- strings are treated as a single symbol. That is, a quoted- string is equivalent to an atom, syntactically. If a quoted- string is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical Analysis of August 13, 1982 - 13 - RFC #822 Standard for ARPA Internet Text Messages Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Therefore, the official semantics do not "see" any bare CRLFs that are in quoted-strings; however particular parsing programs may wish to note their presence. For such programs, it would be rea- sonable to interpret a "CRLF LWSP-char" as being a CRLF which is part of the quoted-string; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash fol- lowed by a CR followed by a LF) are also subject to rules of folding, but the presence of the quoting character (backslash) explicitly indicates that the CRLF is data to the quoted string. Stripping off the first following LWSP-char is also appropriate when parsing quoted CRLFs. 3.4.6. BRACKETING CHARACTERS There is one type of bracket which must occur in matched pairs and may have pairs nested within each other: o Parentheses ("(" and ")") are used to indicate com- ments. There are three types of brackets which must occur in matched pairs, and which may NOT be nested: o Colon/semi-colon (":" and ";") are used in address specifications to indicate that the included list of addresses are to be treated as a group. o Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable refer- ence (e.g., delimiting mailboxes), possibly including source-routing to the machine. o Square brackets ("[" and "]") are used to indicate the presence of a domain-literal, which the appropriate name-domain is to use directly, bypassing normal name-resolution mechanisms. 3.4.7. CASE INDEPENDENCE Except as noted, alphabetic strings may be represented in any combination of upper and lower case. The only syntactic units August 13, 1982 - 14 - RFC #822 Standard for ARPA Internet Text Messages which requires preservation of case information are: - text - qtext - dtext - ctext - quoted-pair - local-part, except "Postmaster" When matching any other syntactic unit, case is to be ignored. For example, the field-names "From", "FROM", "from", and even "FroM" are semantically equal and should all be treated ident- ically. When generating these units, any mix of upper and lower case alphabetic characters may be used. The case shown in this specification is suggested for message-creating processes. Note: The reserved local-part address unit, "Postmaster", is an exception. When the value "Postmaster" is being interpreted, it must be accepted in any mixture of case, including "POSTMASTER", and "postmaster". 3.4.8. FOLDING LONG HEADER FIELDS Each header field may be represented on exactly one line con- sisting of the name of the field and its body, and terminated by a CRLF; this is what the parser sees. For readability, the field-body portion of long header fields may be "folded" onto multiple lines of the actual field. "Long" is commonly inter- preted to mean greater than 65 or 72 characters. The former length serves as a limit, when the message is to be viewed on most simple terminals which use simple display software; how- ever, the limit is not imposed by this standard. Note: Some display software often can selectively fold lines, to suit the display terminal. In such cases, sender- provided folding can interfere with the display software. 3.4.9. BACKSPACE CHARACTERS ASCII BS characters (Backspace, decimal 8) may be included in texts and quoted-strings to effect overstriking. However, any use of backspaces which effects an overstrike to the left of the beginning of the text or quoted-string is prohibited. August 13, 1982 - 15 - RFC #822 Standard for ARPA Internet Text Messages 3.4.10. NETWORK-SPECIFIC TRANSFORMATIONS During transmission through heterogeneous networks, it may be necessary to force data to conform to a network's local con- ventions. For example, it may be required that a CR be fol- lowed either by LF, making a CRLF, or by , if the CR is to stand alone). Such transformations are reversed, when the message exits that network. When crossing network boundaries, the message should be treated as passing through two modules. It will enter the first module containing whatever network-specific transforma- tions that were necessary to permit migration through the "current" network. It then passes through the modules: o Transformation Reversal The "current" network's idiosyncracies are removed and the message is returned to the canonical form speci- fied in this standard. o Transformation The "next" network's local idiosyncracies are imposed on the message. ------------------ From ==> | Remove Net-A | Net-A | idiosyncracies | ------------------ || \/ Conformance with standard || \/ ------------------ | Impose Net-B | ==> To | idiosyncracies | Net-B ------------------ August 13, 1982 - 16 - RFC #822 Standard for ARPA Internet Text Messages 4. MESSAGE SPECIFICATION 4.1. SYNTAX Note: Due to an artifact of the notational conventions, the syn- tax indicates that, when present, some fields, must be in a particular order. Header fields are NOT required to occur in any particular order, except that the message body must occur AFTER the headers. It is recommended that, if present, headers be sent in the order "Return- Path", "Received", "Date", "From", "Subject", "Sender", "To", "cc", etc. This specification permits multiple occurrences of most fields. Except as noted, their interpretation is not specified here, and their use is discouraged. The following syntax for the bodies of various fields should be thought of as describing each field body as a single long string (or line). The "Lexical Analysis of Message" section on "Long Header Fields", above, indicates how such long strings can be represented on more than one line in the actual transmitted message. message = fields *( CRLF *text ) ; Everything after ; first null line ; is message body fields = dates ; Creation time, source ; author id & one 1*destination ; address required *optional-field ; others optional source = [ trace ] ; net traversals originator ; original mail [ resent ] ; forwarded trace = return ; path to sender 1*received ; receipt tags return = "Return-path" ":" route-addr ; return address received = "Received" ":" ; one per relay ["from" domain] ; sending host ["by" domain] ; receiving host ["via" atom] ; physical path *("with" atom) ; link/mail protocol ["id" msg-id] ; receiver msg id ["for" addr-spec] ; initial form August 13, 1982 - 17 - RFC #822