Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!cbosgd!ulysses!bellcore!decvax!decwrl!pyramid!pesnta!amd!amdcad!lll-crg!seismo!brl-adm!ron
From: ron@brl-adm.UUCP
Newsgroups: mod.sources.doc
Subject: rfc822 (2 of 5)
Message-ID: <723@brl-adm.ARPA>
Date: Tue, 20-May-86 00:00:08 EDT
Article-I.D.: brl-adm.723
Posted: Tue May 20 00:00:08 1986
Date-Received: Sat, 24-May-86 22:29:13 EDT
Distribution: net
Organization: Ballistic Research Lab
Lines: 580
Approved: RON@BRL.ARPA


     Standard for ARPA Internet Text Messages


        is analyzed into the following lexical symbols and types:

                    :sysmail              quoted string
                    @                     special
                    Some-Group            atom
                    .                     special
                    Some-Org              atom
                    ,                     special
                    Muhammed              atom
                    .                     special
                    (I am  the greatest)  comment
                    Ali                   atom
                    @                     atom
                    (the)                 comment
                    Vegas                 atom
                    .                     special
                    WBA                   atom

        The canonical representations for the data in these  addresses
        are the following strings:

                        ":sysmail"@Some-Group.Some-Org

        and

                            Muhammed.Ali@Vegas.WBA

        Note:  For purposes of display, and when passing  such  struc-
               tured information to other systems, such as mail proto-
               col  services,  there  must  be  NO  linear-white-space
               between  <word>s  that are separated by period (".") or
               at-sign ("@") and exactly one SPACE between  all  other
               <word>s.  Also, headers should be in a folded form.


     August 13, 1982               - 8 -                      RFC #822


     Standard for ARPA Internet Text Messages


     3.2.  HEADER FIELD DEFINITIONS

          These rules show a field meta-syntax, without regard for the
     particular  type  or internal syntax.  Their purpose is to permit
     detection of fields; also, they present to  higher-level  parsers
     an image of each field as fitting on one line.

     field       =  field-name ":" [ field-body ] CRLF

     field-name  =  1*<any CHAR, excluding CTLs, SPACE, and ":">

     field-body  =  field-body-contents
                    [CRLF LWSP-char field-body]

     field-body-contents =
                   <the ASCII characters making up the field-body, as
                    defined in the following sections, and consisting
                    of combinations of atom, quoted-string, and
                    specials tokens, or else consisting of texts>


     August 13, 1982               - 9 -                      RFC #822


     Standard for ARPA Internet Text Messages


     3.3.  LEXICAL TOKENS

          The following rules are used to define an underlying lexical
     analyzer,  which  feeds  tokens to higher level parsers.  See the
     ANSI references, in the Bibliography.

                                                 ; (  Octal, Decimal.)
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
     ALPHA       =  <any ASCII alphabetic character>
                                                 ; (101-132, 65.- 90.)
                                                 ; (141-172, 97.-122.)
     DIGIT       =  <any ASCII decimal digit>    ; ( 60- 71, 48.- 57.)
     CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                     character and DEL>          ; (    177,     127.)
     CR          =  <ASCII CR, carriage return>  ; (     15,      13.)
     LF          =  <ASCII LF, linefeed>         ; (     12,      10.)
     SPACE       =  <ASCII SP, space>            ; (     40,      32.)
     HTAB        =  <ASCII HT, horizontal-tab>   ; (     11,       9.)
     <">         =  <ASCII quote mark>           ; (     42,      34.)
     CRLF        =  CR LF

     LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE

     linear-white-space =  1*([CRLF] LWSP-char)  ; semantics = SPACE
                                                 ; CRLF => folding

     specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
                 /  "," / ";" / ":" / "\" / <">  ;  string, to use
                 /  "." / "[" / "]"              ;  within a word.

     delimiters  =  specials / linear-white-space / comment

     text        =  <any CHAR, including bare    ; => atoms, specials,
                     CR & bare LF, but NOT       ;  comments and
                     including CRLF>             ;  quoted-strings are
                                                 ;  NOT recognized.

     atom        =  1*<any CHAR except specials, SPACE and CTLs>

     quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.

     qtext       =  <any CHAR excepting <">,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>

     domain-literal =  "[" *(dtext / quoted-pair) "]"


     August 13, 1982              - 10 -                      RFC #822


     Standard for ARPA Internet Text Messages


     dtext       =  <any CHAR excluding "[",     ; => may be folded
                     "]", "\" & CR, & including
                     linear-white-space>

     comment     =  "(" *(ctext / quoted-pair / comment) ")"

     ctext       =  <any CHAR excluding "(",     ; => may be folded
                     ")", "\" & CR, & including
                     linear-white-space>

     quoted-pair =  "\" CHAR                     ; may quote any char

     phrase      =  1*word                       ; Sequence of words

     word        =  atom / quoted-string


     3.4.  CLARIFICATIONS

     3.4.1.  QUOTING

        Some characters are reserved for special interpretation,  such
        as  delimiting lexical tokens.  To permit use of these charac-
        ters as uninterpreted data, a quoting mechanism  is  provided.
        To quote a character, precede it with a backslash ("\").

        This mechanism is not fully general.  Characters may be quoted
        only  within  a subset of the lexical constructs.  In particu-
        lar, quoting is limited to use within:

                             -  quoted-string
                             -  domain-literal
                             -  comment

        Within these constructs, quoting is REQUIRED for  CR  and  "\"
        and for the character(s) that delimit the token (e.g., "(" and
        ")" for a comment).  However, quoting  is  PERMITTED  for  any
        character.

        Note:  In particular, quoting is NOT permitted  within  atoms.
               For  example  when  the local-part of an addr-spec must
               contain a special character, a quoted  string  must  be
               used.  Therefore, a specification such as:

                            Full\ Name@Domain

               is not legal and must be specified as:

                            "Full Name"@Domain


     August 13, 1982              - 11 -                      RFC #822


     Standard for ARPA Internet Text Messages


     3.4.2.  WHITE SPACE

        Note:  In structured field bodies, multiple linear space ASCII
               characters  (namely  HTABs  and  SPACEs) are treated as
               single spaces and may freely surround any  symbol.   In
               all header fields, the only place in which at least one
               LWSP-char is REQUIRED is at the beginning of  continua-
               tion lines in a folded field.

        When passing text to processes  that  do  not  interpret  text
        according to this standard (e.g., mail protocol servers), then
        NO linear-white-space characters should occur between a period
        (".") or at-sign ("@") and a <word>.  Exactly ONE SPACE should
        be used in place of arbitrary linear-white-space  and  comment
        sequences.

        Note:  Within systems conforming to this standard, wherever  a
               member of the list of delimiters is allowed, LWSP-chars
               may also occur before and/or after it.

        Writers of  mail-sending  (i.e.,  header-generating)  programs
        should realize that there is no network-wide definition of the
        effect of ASCII HT (horizontal-tab) characters on the  appear-
        ance  of  text  at another network host; therefore, the use of
        tabs in message headers, though permitted, is discouraged.

     3.4.3.  COMMENTS

        A comment is a set of ASCII characters, which is  enclosed  in
        matching  parentheses  and which is not within a quoted-string
        The comment construct permits message originators to add  text
        which  will  be  useful  for  human readers, but which will be
        ignored by the formal semantics.  Comments should be  retained
        while  the  message  is subject to interpretation according to
        this standard.  However, comments  must  NOT  be  included  in
        other  cases,  such  as  during  protocol  exchanges with mail
        servers.

        Comments nest, so that if an unquoted left parenthesis  occurs
        in  a  comment  string,  there  must  also be a matching right
        parenthesis.  When a comment acts as the delimiter  between  a
        sequence of two lexical symbols, such as two atoms, it is lex-
        ically equivalent with a single SPACE,  for  the  purposes  of
        regenerating  the  sequence, such as when passing the sequence
        onto a mail protocol server.  Comments are  detected  as  such
        only within field-bodies of structured fields.

        If a comment is to be "folded" onto multiple lines,  then  the
        syntax  for  folding  must  be  adhered to.  (See the "Lexical


     August 13, 1982              - 12 -                      RFC #822


     Standard for ARPA Internet Text Messages


        Analysis of Messages" section on "Folding Long Header  Fields"
        above,  and  the  section on "Case Independence" below.)  Note
        that  the  official  semantics  therefore  do  not  "see"  any
        unquoted CRLFs that are in comments, although particular pars-
        ing programs may wish to note their presence.  For these  pro-
        grams,  it would be reasonable to interpret a "CRLF LWSP-char"
        as being a CRLF that is part of the comment; i.e., the CRLF is
        kept  and  the  LWSP-char is discarded.  Quoted CRLFs (i.e., a
        backslash followed by a CR followed by a  LF)  still  must  be
        followed by at least one LWSP-char.

     3.4.4.  DELIMITING AND QUOTING CHARACTERS

        The quote character (backslash) and  characters  that  delimit
        syntactic  units  are not, generally, to be taken as data that
        are part of the delimited or quoted unit(s).   In  particular,
        the   quotation-marks   that   define   a  quoted-string,  the
        parentheses that define  a  comment  and  the  backslash  that
        quotes  a  following  character  are  NOT  part of the quoted-
        string, comment or quoted character.  A quotation-mark that is
        to  be  part  of  a quoted-string, a parenthesis that is to be
        part of a comment and a backslash that is to be part of either
        must  each be preceded by the quote-character backslash ("\").
        Note that the syntax allows any character to be quoted  within
        a  quoted-string  or  comment; however only certain characters
        MUST be quoted to be included as data.  These  characters  are
        the  ones that are not part of the alternate text group (i.e.,
        ctext or qtext).

        The one exception to this rule  is  that  a  single  SPACE  is
        assumed  to  exist  between  contiguous words in a phrase, and
        this interpretation is independent of  the  actual  number  of
        LWSP-chars  that  the  creator  places  between the words.  To
        include more than one SPACE, the creator must make  the  LWSP-
        chars be part of a quoted-string.

        Quotation marks that delimit a quoted string  and  backslashes
        that  quote  the  following character should NOT accompany the
        quoted-string when the string is passed to processes  that  do
        not interpret data according to this specification (e.g., mail
        protocol servers).

     3.4.5.  QUOTED-STRINGS

        Where permitted (i.e., in words in structured fields)  quoted-
        strings  are  treated  as a single symbol.  That is, a quoted-
        string is equivalent to an atom, syntactically.  If a  quoted-
        string  is to be "folded" onto multiple lines, then the syntax
        for folding must be adhered to.  (See the "Lexical Analysis of


     August 13, 1982              - 13 -                      RFC #822


     Standard for ARPA Internet Text Messages


        Messages"  section  on "Folding Long Header Fields" above, and
        the section on "Case  Independence"  below.)   Therefore,  the
        official  semantics  do  not  "see" any bare CRLFs that are in
        quoted-strings; however particular parsing programs  may  wish
        to  note  their presence.  For such programs, it would be rea-
        sonable to interpret a "CRLF LWSP-char" as being a CRLF  which
        is  part  of the quoted-string; i.e., the CRLF is kept and the
        LWSP-char is discarded.  Quoted CRLFs (i.e., a backslash  fol-
        lowed  by  a CR followed by a LF) are also subject to rules of
        folding, but the presence of the quoting character (backslash)
        explicitly  indicates  that  the  CRLF  is  data to the quoted
        string.  Stripping off the first following LWSP-char  is  also
        appropriate when parsing quoted CRLFs.

     3.4.6.  BRACKETING CHARACTERS

        There is one type of bracket which must occur in matched pairs
        and may have pairs nested within each other:

            o   Parentheses ("(" and ")") are used  to  indicate  com-
                ments.

        There are three types of brackets which must occur in  matched
        pairs, and which may NOT be nested:

            o   Colon/semi-colon (":" and ";") are   used  in  address
                specifications  to  indicate that the included list of
                addresses are to be treated as a group.

            o   Angle brackets ("<" and ">")  are  generally  used  to
                indicate  the  presence of a one machine-usable refer-
                ence (e.g., delimiting mailboxes), possibly  including
                source-routing to the machine.

            o   Square brackets ("[" and "]") are used to indicate the
                presence  of  a  domain-literal, which the appropriate
                name-domain  is  to  use  directly,  bypassing  normal
                name-resolution mechanisms.

     3.4.7.  CASE INDEPENDENCE

        Except as noted, alphabetic strings may be represented in  any
        combination of upper and lower case.  The only syntactic units


     August 13, 1982              - 14 -                      RFC #822


     Standard for ARPA Internet Text Messages


        which requires preservation of case information are:

                    -  text
                    -  qtext
                    -  dtext
                    -  ctext
                    -  quoted-pair
                    -  local-part, except "Postmaster"

        When matching any other syntactic unit, case is to be ignored.
        For  example, the field-names "From", "FROM", "from", and even
        "FroM" are semantically equal and should all be treated ident-
        ically.

        When generating these units, any mix of upper and  lower  case
        alphabetic  characters  may  be  used.  The case shown in this
        specification is suggested for message-creating processes.

        Note:  The reserved local-part address unit, "Postmaster",  is
               an  exception.   When  the  value "Postmaster" is being
               interpreted, it must be  accepted  in  any  mixture  of
               case, including "POSTMASTER", and "postmaster".

     3.4.8.  FOLDING LONG HEADER FIELDS

        Each header field may be represented on exactly one line  con-
        sisting  of the name of the field and its body, and terminated
        by a CRLF; this is what the parser sees.  For readability, the
        field-body  portion of long header fields may be "folded" onto
        multiple lines of the actual field.  "Long" is commonly inter-
        preted  to  mean greater than 65 or 72 characters.  The former
        length serves as a limit, when the message is to be viewed  on
        most  simple terminals which use simple display software; how-
        ever, the limit is not imposed by this standard.

        Note:  Some display software often can selectively fold lines,
               to  suit  the display terminal.  In such cases, sender-
               provided  folding  can  interfere  with   the   display
               software.

     3.4.9.  BACKSPACE CHARACTERS

        ASCII BS characters (Backspace, decimal 8) may be included  in
        texts and quoted-strings to effect overstriking.  However, any
        use of backspaces which effects an overstrike to the  left  of
        the beginning of the text or quoted-string is prohibited.


     August 13, 1982              - 15 -                      RFC #822


     Standard for ARPA Internet Text Messages


     3.4.10.  NETWORK-SPECIFIC TRANSFORMATIONS

        During transmission through heterogeneous networks, it may  be
        necessary  to  force data to conform to a network's local con-
        ventions.  For example, it may be required that a CR  be  fol-
        lowed  either by LF, making a CRLF, or by <null>, if the CR is
        to stand alone).  Such transformations are reversed, when  the
        message exits that network.

        When  crossing  network  boundaries,  the  message  should  be
        treated  as  passing  through  two modules.  It will enter the
        first module containing whatever network-specific  transforma-
        tions  that  were  necessary  to  permit migration through the
        "current" network.  It then passes through the modules:

            o   Transformation Reversal

                The "current" network's idiosyncracies are removed and
                the  message  is returned to the canonical form speci-
                fied in this standard.

            o   Transformation

                The "next" network's local idiosyncracies are  imposed
                on the message.

                                ------------------
                    From   ==>  | Remove Net-A   |
                    Net-A       | idiosyncracies |
                                ------------------
                                       ||
                                       \/
                                  Conformance
                                  with standard
                                       ||
                                       \/
                                ------------------
                                | Impose Net-B   |  ==>  To
                                | idiosyncracies |       Net-B
                                ------------------


     August 13, 1982              - 16 -                      RFC #822


     Standard for ARPA Internet Text Messages


     4.  MESSAGE SPECIFICATION

     4.1.  SYNTAX

     Note:  Due to an artifact of the notational conventions, the syn-
            tax  indicates that, when present, some fields, must be in
            a particular order.  Header fields  are  NOT  required  to
            occur  in  any  particular  order, except that the message
            body must occur AFTER  the  headers.   It  is  recommended
            that,  if  present,  headers be sent in the order "Return-
            Path", "Received", "Date",  "From",  "Subject",  "Sender",
            "To", "cc", etc.

            This specification permits multiple  occurrences  of  most
            fields.   Except  as  noted,  their  interpretation is not
            specified here, and their use is discouraged.

          The following syntax for the bodies of various fields should
     be  thought  of  as  describing  each field body as a single long
     string (or line).  The "Lexical Analysis of Message"  section  on
     "Long  Header Fields", above, indicates how such long strings can
     be represented on more than one line in  the  actual  transmitted
     message.

     message     =  fields *( CRLF *text )       ; Everything after
                                                 ;  first null line
                                                 ;  is message body

     fields      =    dates                      ; Creation time,
                      source                     ;  author id & one
                    1*destination                ;  address required
                     *optional-field             ;  others optional

     source      = [  trace ]                    ; net traversals
                      originator                 ; original mail
                   [  resent ]                   ; forwarded

     trace       =    return                     ; path to sender
                    1*received                   ; receipt tags

     return      =  "Return-path" ":" route-addr ; return address

     received    =  "Received"    ":"            ; one per relay
                       ["from" domain]           ; sending host
                       ["by"   domain]           ; receiving host
                       ["via"  atom]             ; physical path
                      *("with" atom)             ; link/mail protocol
                       ["id"   msg-id]           ; receiver msg id
                       ["for"  addr-spec]        ; initial form


     August 13, 1982              - 17 -                      RFC #822