Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!ncis.tis.llnl.gov!mcb From: mcb@ncis.tis.llnl.gov (Michael C. Berch) Newsgroups: news.admin Subject: Re: Usenet Top-Level Domain Census Summary: About Message-IDs Message-ID: <167@ncis.tis.llnl.gov> Date: 29 Apr 89 07:14:09 GMT References: <164@ncis.tis.llnl.gov> <1660@vicom.COM> Reply-To: mcb@ncis.tis.llnl.gov (Michael C. Berch) Organization: Lawrence Livermore National Laboratory, Livermore CA Lines: 79 In article <1660@vicom.COM> lmb@vicom.COM (Larry Blair) writes: > In <164@ncis.tis.llnl.gov> mcb@ncis.tis.llnl.gov (Michael C. Berch) writes: > > We were sitting around jawing about conversion to the Domain Name > > System and how many sites are still using ".UUCP" or other name schemes, > > and how many sites don't put out legal Message-IDs at all. > > Wait a sec. If you're talking about the Message-ID in news postings, I > have to disagree with your concept of illegal. A legal Message-ID is one > that uniquely identifies a particular news posting, does not conflict with > any other site's Message-IDs, and doesn't cause the news software to barf. > It is only used for article identification and has nothing to do addresses > or domains. You'll notice that my From: says lmb@vicom.COM and the Message-ID > is . This only because I went in and hacked news source to > produce this. Unmodified, it generated , using our uucp node > name and appending our domain. That was a perfectly legal ID (I changed it > for esthetic reasons). If a site wants to generate IDs like > that's perfectly ok, provided they don't reuse that string. The software > doesn't care. Well, yes and no. Here's what the standard says (M. Horton & R. Adams, Standard for the Interchange of USENET Messages, RFC-1036, December 1987): "2.1.5. Message-ID The "Message-ID" line gives the message a unique identifier. The Message-ID may not be reused during the lifetime of any previous message with the same Message-ID. (It is recommended that no Message-ID be reused for at least two years.) Message-ID's have the syntax: "> In order to conform to RFC-822, the Message-ID must have the format: where full_domain_name is the full name of the host at which the message entered the network, including a domain that host is in, and unique is any string of printing ASCII characters, not including "<" (left angle bracket), ">" (right angle bracket), or "@" (at sign). [...]" My interpretation of this, and I believe it to be the general sense of the community, is that (1) the software as it presently exists will accept the weaker first form as a legal Message-ID, but (2) as a matter of policy all Message-IDs should conform to RFC-822, which requires a domain specification. This latter requirement is especially valuable in preserving Message-IDs across news/mail links (particularly important to those of us who do mailing-list/Usenet gatewaying), and for the transport of news other than by UUCP. Furthermore, as a matter of common sense, it is apparent that the only guarantee of uniqueness of a Message-ID is for each host to insert a unique host identifier. The only such unique identifier is a full domain name (including hosts in the UUCP pseudo-domain who have registered names in the UUCP map), since they are assigned by an external source. Otherwise, what is to prevent two hosts from both using "" as a Message-ID? Section 2.1.5 of RFC-1036 goes on to admonish programmers not to make unwarranted assumptions about the content or syntax of Message-IDs; if I were writing news software I would certainly not have it bounce anything that didn't have a domain spec, but I felt perfectly free in doing so for the purpose of the domain census. Considering that less than 2% of the IDs turned up as "illegal" under this metric proves the point, I think. (A number of the illegal IDs were in fact badly-formed under the weak standard in that they did not begin and end with "<" and ">".) A couple people wrote and asked me why I used the Message-IDs instead of the From: line. The intent of the exercise was a quick first-order approximation of the state of domain conversion in Usenet. My awk script took about 3 minutes to run through the 64K history file. Using the From: line would have either meant bashing 64,000 inodes or writing something to collect From: lines as articles arrived (probably the *right* way to do it); in either case I was unwilling to wait so long for the results, which I doubt would have been significantly different. Michael C. Berch mcb@ncis.llnl.gov / uunet!ncis.llnl.gov!mcb