Xref: utzoo news.admin:15145 news.software.b:8225 Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!ucla-cs!twinsun!eggert From: eggert@twinsun.com (Paul Eggert) Newsgroups: news.admin,news.software.b Subject: Recently observed nonconforming Message-IDs (discussion) Message-ID: <1991Jun12.071111.29652@twinsun.com> Date: 12 Jun 91 07:11:11 GMT Sender: usenet@twinsun.com Organization: Twin Sun, Inc Lines: 79 Nntp-Posting-Host: twinsun Commonly used news transport software accepts many Message-IDs that do not conform to the Internet RFCs (1036, 822, and 1123). Of course, news software need not, and probably should not, reject all articles containing nonconforming Message-IDs, but they are signs that something is misconfigured or is otherwise confused, and some of the problems may be serious enough to warrant concern. Typically the problem lies at the originating site. I looked at a sample news history file containing 63381 recent Usenet Message-IDs, and found 388 nonconforming Message-IDs. Here is a table of reasons for lack of conformance, together with the number of corresponding Message-IDs. #articles conformance problem, and example nonconforming Message-ID 217 The local-part may not contain an unquoted `:'. <17169:Jun1122:04:5791@kramden.acf.nyu.edu> 47 A domain may not end with `.'. <25590171@hpcvra.cv.hp.com.> 42 A domain may not begin with `.'. <9106052317.AA01532@.devon.prepnet.com> 36 A Message-ID cannot contain two unquoted `@'s. <1991Jun6.165406.70@%boot.decnet@edwards-tems.af.mil> 32 The local-part may not end with `.'. 22 The local-part may not begin with `.'. <.+9+WD_@cs.widener.edu> 17 Two adjacent unquoted `.'s may not appear in a Message-ID. <9105211421.AA10695@mailserv.zdv.uni-tuebingen..de> 12 The domain may not be empty. <1991Jun5.192745.1945@> 2 The local-part may not contain an unquoted `]'. <]ocdj2.cj7@cat.de> 2 Quotes must match. <"<9105141856.AA26881@cnmus.cnm.us.es> The article count does not sum to 388, because some Message-IDs had more than one reason. The list of reasons may not be exhaustive, because I stopped looking for reasons once I discovered a reason for every nonconforming Message-ID. The news history file examined was twinsun.com's news history file as of 1991/06/12 03:15 GMT. This host runs C News 24-Mar-1991, subscribes to just the technical Usenet newsgroups, and expires history after 30 days. You can look for nonconforming Message-IDs on your host by running the following shell script with your news history file as standard input; it will copy nonconforming lines to standard output. It's been tested only under C News. Please make sure your egrep is up to the task; I used GNU egrep. #!/bin/sh # These definitions are taken from RFC 822, except (as per RFC 1036) # white space and nonprinting characters are excluded. dtext='[!-Z^-~]' qtext='[]-~!#-[]' quoted_pair='\\[!-~]' quoted_string="\"($qtext|$quoted_pair)*\"" atom="[-!#-'*-+/-9=?A-Z^-~]+" word="($atom|$quoted_string)" domain_literal="\\[($dtext|$quoted_pair)*\\]" domain_ref="$atom" sub_domain="($domain_ref|$domain_literal)" domain="$sub_domain(\\.$sub_domain)*" local_part="$word(\\.$word)*" addr_spec="$local_part@$domain" msg_id="<$addr_spec>" egrep -v "^$msg_id"