Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!ncis.tis.llnl.gov!mcb
From: mcb@ncis.tis.llnl.gov (Michael C. Berch)
Newsgroups: news.admin
Subject: Re: Usenet Top-Level Domain Census
Summary: About Message-IDs
Message-ID: <167@ncis.tis.llnl.gov>
Date: 29 Apr 89 07:14:09 GMT
References: <164@ncis.tis.llnl.gov> <1660@vicom.COM>
Reply-To: mcb@ncis.tis.llnl.gov (Michael C. Berch)
Organization: Lawrence Livermore National Laboratory, Livermore CA
Lines: 79

In article <1660@vicom.COM> lmb@vicom.COM (Larry Blair) writes:
> In <164@ncis.tis.llnl.gov> mcb@ncis.tis.llnl.gov (Michael C. Berch) writes:
> > We were sitting around jawing about conversion to the Domain Name
> > System and how many sites are still using ".UUCP" or other name schemes, 
> > and how many sites don't put out legal Message-IDs at all.
> 
> Wait a sec.  If you're talking about the Message-ID in news postings, I
> have to disagree with your concept of illegal.  A legal Message-ID is one
> that uniquely identifies a particular news posting, does not conflict with
> any other site's Message-IDs, and doesn't cause the news software to barf.
> It is only used for article identification and has nothing to do addresses
> or domains.  You'll notice that my From: says lmb@vicom.COM and the Message-ID
> is <nnnn@vicom.COM>.  This only because I went in and hacked news source to
> produce this.  Unmodified, it generated <nnnn@vsi1.COM>, using our uucp node
> name and appending our domain.  That was a perfectly legal ID (I changed it
> for esthetic reasons).  If a site wants to generate IDs like <xAqd-froo%baz>
> that's perfectly ok, provided they don't reuse that string.  The software
> doesn't care.

Well, yes and no.  Here's what the standard says (M. Horton & R. Adams, 
Standard for the Interchange of USENET Messages, RFC-1036, December 1987):

   "2.1.5.  Message-ID

    The "Message-ID" line gives the message a unique identifier.  The
    Message-ID may not be reused during the lifetime of any previous
    message with the same Message-ID.  (It is recommended that no
    Message-ID be reused for at least two years.)  Message-ID's have the
    syntax:

                     <string not containing blank or ">">

    In order to conform to RFC-822, the Message-ID must have the format:

                          <unique@full_domain_name>

    where full_domain_name is the full name of the host at which the
    message entered the network, including a domain that host is in, and
    unique is any string of printing ASCII characters, not including "<"
    (left angle bracket), ">" (right angle bracket), or "@" (at sign).
    [...]"

My interpretation of this, and I believe it to be the general sense of
the community, is that (1) the software as it presently exists will 
accept the weaker first form as a legal Message-ID, but (2) as a
matter of policy all Message-IDs should conform to RFC-822, which
requires a domain specification.  This latter requirement is
especially valuable in preserving Message-IDs across news/mail links
(particularly important to those of us who do mailing-list/Usenet
gatewaying), and for the transport of news other than by UUCP. 
Furthermore, as a matter of common sense, it is apparent that the only
guarantee of uniqueness of a Message-ID is for each host to insert a
unique host identifier.  The only such unique identifier is a full
domain name (including hosts in the UUCP pseudo-domain who have
registered names in the UUCP map), since they are assigned by an
external source.  Otherwise, what is to prevent two hosts from both
using "<xAqd-froo%baz>" as a Message-ID?

Section 2.1.5 of RFC-1036 goes on to admonish programmers not to make
unwarranted assumptions about the content or syntax of Message-IDs; if
I were writing news software I would certainly not have it bounce anything
that didn't have a domain spec, but I felt perfectly free in doing so
for the purpose of the domain census.  Considering that less than 2%
of the IDs turned up as "illegal" under this metric proves the point,
I think. (A number of the illegal IDs were in fact badly-formed under
the weak standard in that they did not begin and end with "<" and ">".)

A couple people wrote and asked me why I used the Message-IDs instead
of the From: line.  The intent of the exercise was a quick first-order
approximation of the state of domain conversion in Usenet. My awk
script took about 3 minutes to run through the 64K history file.  
Using the From: line would have either meant bashing 64,000 inodes or
writing something to collect From: lines as articles arrived (probably
the *right* way to do it); in either case I was unwilling to wait so
long for the results, which I doubt would have been significantly
different.

Michael C. Berch  
mcb@ncis.llnl.gov / uunet!ncis.llnl.gov!mcb