Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!jerry From: jerry@oliveb.UUCP (Jerry F Aguirre) Newsgroups: news.software.b Subject: Re: Is the history file really needed anymore? Message-ID: <414@oliveb.UUCP> Date: Wed, 21-Jan-87 15:35:25 EST Article-I.D.: oliveb.414 Posted: Wed Jan 21 15:35:25 1987 Date-Received: Thu, 22-Jan-87 03:20:45 EST References: <5504@ukma.ms.uky.csnet> <1307@ncr-sd.UUCP> <7529@utzoo.UUCP> Reply-To: jerry@oliveb.UUCP (Jerry F Aguirre) Organization: Olivetti ATC; Cupertino, Ca Lines: 75 Keywords: netnews history dbm In article <7529@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >Eliminating the history file is, on the whole, a silly idea. Precisely >what benefits is the change supposed to produce? If your news system >lets articles get filed without putting them in the history file, this >is a defect of the implementation, not the concept. Ditto for foulups in >coordination between inews and expire. (What makes you think that the >implementation of the new concept will be any better? Of course, it won't >foul up in precisely the same way...) The map is not the territory. Whenever you keep a separate index to something there is the potential for the index to not agree. With the index built into the data there is less potential for disagreement. In this case we have three different versions, the history file, the dbm copy, and the articles themselves. I see a constant series of complaints on the net about the history file being out of sync. The classic case seems to be running out of disk space which causes the history file to wind up zero length. It is not quite clear to me what changes to the news software could reliably handle the problem of running out of disk space. With no history file the potential for duplicates of the same article arriving can cause even more disk space problems as well as user annoyance. >The assumption that article-ids always have reasonable formats is terribly >naive. Have any of the people suggesting this done *systematic* *surveys* >of article-ids? I thought not. The vast majority of article-ids are in >simple and reasonable format; it's the not-insignificant minority that >aren't that causes trouble. (I speak from experience.) The format of the message IDs and the difficulty in translating them into filenames is not a serious problem. It is trivial to design one that can handle any character value, even non-ascii characters. For example: take the first character and split it into two hex digits, make directories for those names (00 - ff). Repeat this with the next character until the number of files per directory is reasonable. As far as characters for the final file name, Unix allows any characters in a filename except the '/' (and of course null). Message IDs are spec'ed to allow any character except the blank and null. Translate any '/' into blanks and you have a legal Unix file name. Length of the ID is more of a problem and could be handled by taking the first N (14) characters, making a directory of that name, etc. Non-Unix systems would have different restrictions but the same principle could be applied. For instance VMS systems have a more restricted filename character set but do not suffer from the directory length problem. An extension of the translate characters to hex strategy could be used. If the article were created without write permission (mode 444) then the create itself provides all the (portable) locking necessary to prevent two "simultaneous" receptions of the same article. As far as overhead, it is hard to understand how a single call to create a file can have a prohibitive overhead. That is all that is required to check the "history" AND make the new entry. As a directory need only be made for each N files (say 100) the overhead for that can not be significant. I have heard some mention of "searching" directories for each received article. I don't understand what prompted someone to think this. How, exactly, does the proposed method differ from the current method of creating an article (and possibly the directory for it)? I think that most of the same arguments could be made about the current hierarchy for storing the news. Right now the news software uses links for posting to multiple news groups when it could just as easily use index files to cross post. I am no expert on "notes" but doesn't it use that kind of scheme. I seem to remember postings arguing the same kind of "advantages" for the notes system as are being given for the history file. Jerry Aguirre Olivetti ATC