Path: utzoo!utgpu!watserv1!watmath!maytag!looking!brad From: brad@looking.on.ca (Brad Templeton) Newsgroups: news.software.b Subject: Re: review of news programs/readers? Keywords: nn, Gnews, gnus Message-ID: <113812@looking.on.ca> Date: 15 Mar 90 05:04:17 GMT References: <1990Mar11.004538.2773@aai.uu.net> <90Mar12.150907est.1337@smoke.cs.toronto.edu> <1990Mar13.084110.1299@aai.uu.net> <90Mar14.003318est.234@smoke.cs.toronto.edu> <708@software.software.org> Organization: Looking Glass Software Ltd. Lines: 371 Class: information,original Newsclip is not a reader, but an adjunct to one. Eric Raymond and I designed a protocol so that this could be a general class of programs, known as filters or kill programs. Effectively a reader starts up, and opens a pipe/stream/socket to the filter program, which also starts. The reader goes through and prepares to show articles to the user, but it asks the filter program on each one to see if it should go through. This is how I read news, and it's actually quite efficient on a server and not too bad on a client. (On a client, usually you can filter on just the header and it's not too bad, but if you have to filter on strings in the body, you need to get the whole article before presenting it.) I have added one command to RN to send commands to the filter program. This way simple macros say things like "kill all threads started by this user" or "I want to see more about this subject" to the filter program. Since they work in parallel, it makes sense at some point to design a reader that is presenting articles to be queried in the background while you're reading. Newsclip also can filter a .newsrc, but that's not a newsreading tool. Here's the info on how it works. You can find it in the file ~ftp/ClariNet/nc.tar.Z on uunet. There's a small fee if you want to use it on a regular basis. ----- With NewsClip, you can control what is shown to you as finely as you desire, with all the tools of a powerful programming language at your disposal. The programs you write are compiled, so they filter you news for you quickly -- often without any noticeable delay. Your NewsClip programs accept, reject or *weight* articles based on C-like expressions you write to describe what you want or don't want to see. You might reject all articles in "rec.humor" that are cross-posted to "talk.bizarre", unless they are posted by a user at your own site with: reject if is rec.humor && is talk.bizarre && domain(from) != my_domain; Your compiled programs can work in several ways. With a few small alterations to your newsreader (patches are provided for the RN reader), your program will filter news as you read it, usually with no delay. You simply see the articles you wish to see. The modified RN reader can also send commands to the filter program to interactively control filtering. You can also arrange to filter news in the background, or at night. Your filter program will read your news subscription file, scan all unread articles, and pre-mark undesired articles as read, so that you never see them. This works with any system that uses a .newsrc file. NewsClip programs can also filter a list of article filenames, such as a list of articles to be fed to another system. This way you can fine tune the feeding of articles to other systems as precisely as you desire. You can also arrange to feed other sites by creating a newsrc for the destination site, so that you don't need the "sys" file. An important thing to remember is that with NewsClip, you are not limited to describing what you don't want to see, as is the case with RN KILL files. You can also request what you want to see, and eliminate all the rest. Or you can combine the two, or vary the rules from group to group or message to message. Here are some of the things you can do: o Eliminate or request followup trees. You can kill off or follow a topic or subtopic based on the subject or the "Message-id:" and "References:" lines. o Control crossposting. You can request to only see articles crossposted to multiple groups, or reject or accept articles based on what groups they are crossposted to. You can even reject any article crossposted to too many groups. o Eliminate a user, group of users or even a site. You can arrange not only to not see the postings of certain users, but to not even see the followups to those postings. You can thus eliminate unwanted users or classes of postings from *your* net -- you'll never even know they're there. o Keyword match articles based on the presence of patterns in header items or various sections of the article text. You can ask to see articles that mention "unix" but don't mention "ms-dos." o Accept your own articles and give priority to followups to your own articles. o Accept articles posted only to a local distribution, even if they're in a netwide group. As shown above, you can arrange to accept articles from people on your own site, even if you might not normally see them. o Reject articles with signatures that are too long, or which contain too much included text. o Accept only original (non-followup) articles and followups to those articles that you have explicitly decided to track. (It's like having a USENET 1/10th the volume of the current one.) Anything that a computer program can figure out about an article can be used to decide whether you will see the article or not. How NewsClip Works The NewsClip compiler translates your filter program into a C program. This C program is compiled by your local C compiler and linked with the NewsClip library. That library processes articles and handles the interface to news readers and the real world. The goal of your program is to decide whether to accept or reject an article. This can be done piece by piece (reject if *this*, accept if *that*) or it can be done by giving a score to the article based on conditional expressions. You can add points to articles with things you like, and take them away from articles with things you don't like. At the end, if the score is still positive, you see the article. if( lines > 200 ) adjust 200-lines; The language is C-like, but has data types that represent the kind of things found in articles and article headers. Perhaps most important is the "database" type, which is really an integer array that you can index with string values. You use databases to keep track of users, message-ids, patterns, subjects and other key items you might look for in articles. Your NewsClip program can update databases on its own. If an article comes in that you really hate, you can automatically put its message-id in a database that marks messages you don't want to see followups of. To you, it's as if the message was never posted. if( from in badusers ) { badmessages[message_id] = true; reject; } Databases can be stored on disk, and a special feature allows you to "expire" database records that have not been accessed in a while. Your filter program can run in parallel with a newsreader like RN. We have developed a general protocol that any newsreader can use to talk to a filter program. This includes the ability to send commands to the filter program, such as "kill all articles with this subject." To examine an article, you mainly work with the header. There are predefined external variables for all the major headers, or you can custom-declare your own: header string array keys : "keywords", ","; (This gets you an array variable called "keys" whose elements will be the comma-delimited keywords from the "Keywords:" header line.) You can define your own procedures and functions in the NewsClip language, or even import C functions from the C libraries or your own C programs. extern int strlen( string ); NewsClip contains a special *distribution* feature that lets you check the distribution of an article and estimate how many machines it will go to. You can thus accept or reject articles based on their audience, as well as their newsgroup. You can split up articles into various regions when doing pattern matching in the body. The signature, main text, non-included text and whole body are all regions that you can examine independently. reject if line_count(signature) > 20 || newtext has "ron.*reagan"; Sample Program /* Sample NEWSCLIP program that shows what you can do */ /* This program is far more complex than a typical system, which would usually be quite short. */ /* Please folks, this is not the newsclip program that I use, and I don't advocate all the different filtering things here. I am just using them as examples of how to do certain things net people have suggested they wanted done. */ /* You can include pre-defined header lines */ extern userid From; /* the From: line */ extern newsgroup array newsgroups; /* the Newsgroups: line */ extern int distribution_level; /* max distr of article */ extern string array references; /* parent articles */ extern string Subject; /* subject line */ extern int followup; /* is it a followup? */ extern int lines; /* header variable */ /* or define your own header lines */ header string mess_id : "message-id"; /* declare variables */ int counter; /* some databases I will look things up in */ database badmessages; /* message-ids I don't want to see followsup to */ database hated_users; /* users I don't want to see articles from */ database my_articles; /* message-ids that I want to see ALL followups to */ /* declare external C functions from the Newsclip library or your own C libraries */ extern int strlen( string ); /* you can define procedures and functions */ int nice_group( newsgroup n ) { extern string left( string, int ); /* you like all sci newsgroups and rec.humor.funny */ return n == #rec.humor.funny || left(n,1) == "sci"; } procedure INIT() { extern procedure set_include_prefix(string); /* this code gets run when the program starts */ set_include_prefix( "[:>]" ); hated_users = read_database( "~./hatedusers" ); my_articles = read_database( "~./myarticles" ); } procedure STARTGROUP() { /* This gets called when we begin to scan a new newsgroup */ /* read in the database of bad message-ids for this group */ badmessages = read_database( "~./kill/~n/killdb" ); } procedure ENDGROUP() { /* this gets called to end the newsgroup */ extern datetime time_now; /* write out the bad message database, delete all entries that are older than one month */ write_database( badmessages, "~./kill/~n/killdb", time_now - month ); free_database( badmessages ); } /* here is the main part. The code that is executed for every article to accept or reject it */ procedure ARTICLE() { newsgroup n; extern string domain( string ); extern string right( string, int ); extern int dlevel( newsgroup ); extern string my_domain; extern string my_mail_address; /* show me everything written by people at my own site */ if( domain(From) == my_domain ) { /* Note my own articles in a database of good ones */ if( From == my_mail_address ) my_articles[mess_id] = true; accept; } else if( domain(From) == "hated.domain.com" ) reject; /* never show me anything from THAT site */ /* also show me anything posted only for citywide distribution */ accept if distribution_level <= dlevel(#city); reject if count(newsgroups) > 6; /* I hate crossposting */ /* See if it's a followup to one of MY messages */ accept if References in my_articles; /* See if any of the messages this is a followup of are in our database of bad messages. If so, reject it */ reject if References in badmessages; /* and of course, kill the bad guys */ reject if From in hated_users; /* Now do the newsgroup specific code */ for( n in newsgroups ) switch( n ) { case #rec.humor: /* adjust the score of messages that are crossposted to groups you don't like */ if( is talk.bizarre || is alt.flame ) adjust -10; /* but I like local humour */ accept if distribution_level <= dlevel(#country); break; case #news.groups: /* If you really don't like a user in a group, arrange to store the message id of every message he posts in your bad message database. You won't even see the followups, and it will be as though he didn't exist on the net. */ if( From == "karl@ddsw1.mcs.com" ) { badmessages[mess_id] = true; reject; } break; case #sci.physics: /* I only want to see messages that are crossposted to both sci.physics AND sci.astro, not just one of them */ reject if !is sci.astro; break; case #rec.arts.comics: /* I only want articles that mention watchmen in the subject */ if( subject has "watchmen" ) accept; else reject; case #news.admin: /* bump the score of any article that mentions my name */ if( text has "brad.*templeton" || subject has "brad" ) adjust 1000; break; case #talk.politics.misc: /* I hate long rebuttals. If the article is mostly lines that are included from another, then can it */ if( followup && lines / (1+line_count(included)) < 2 ) reject; /* I hate long signatures on short articles! */ if( lines < 25 && line_count(signature) > 7 ) reject; break; case #talk.politics.theory: /* search for libertarian only in non-included text */ if( newtext has "libertarian" || newtext has "ncp" ) accept; else reject; case #comp.risks: case #rec.arts.sf-lovers: /* my favourite groups */ adjust 20; break; default: if( nice_group(n) ) adjust 15; break; } if( is alt.flame ) adjust -5; /* I would rather not see these */ /* default is a score of 1, which means accept */ /* here at the end, we accept if the score is greater than 0, or if there was an explicit accept, of course */ } procedure TERMINATE() { extern datetime time_now; /* The program is done. Write out global databases */ write_database( my_articles, "~./my_articles", time_now - 3 * week ); } -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473