Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!allbery From: brad@looking.ON.CA (Brad Templeton) Newsgroups: comp.sources.misc Subject: v09i082: newsclip 1.1, part 13 of 15 Message-ID: <74077@uunet.UU.NET> Date: 20 Dec 89 01:28:58 GMT Sender: allbery@uunet.UU.NET Lines: 1268 Approved: allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc) Posting-number: Volume 9, Issue 82 Submitted-by: brad@looking.ON.CA (Brad Templeton) Archive-name: newsclip/part13 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh 'doc/man.mm.3' <<'END_OF_FILE' XA brief introduction to compiling and running your NewsClip filtering Xprograms was given in chapter 2. We will now explore this area Xin more detail. X X.H 2 "Compiling" X.P XThe \fBncc\fP compiler compiles your programs by translating them into XC programs, compiling these with your C compiler, and linking the result Xwith the NewsClip library. X.P XThe translation is fairly simple as compilations go, other than providing Xfor special conversions for NewsClip's data types. It is the library Xthat does most of the work, and thus makes it easy to write a XNewsClip program. X.P XWhen you compile with X.Bb Xncc myprog.nc X.Be Xeverything is done in one step. The source is placed in \fBmyprog.c\fP, Xthat is compiled, including a special file of definitions (usually found Xin \fB/usr/lib/news/newsclip/ucode.h\fP, and this is linked with the library, Xusually found in \fB/usr/lib/news/newsclip/cliplib.a\fP. The C program Xsource is left around for you to examine. The executable program, Xready to run, is placed in the file \fBnclip\fP in your current Xdirectory. X.P XYou can alter this a bit if you like. For example, you can skip the XC compile and link stage with the \fI-link\fP option, allowing you to Xexamine the resulting C program and compile it on your own. Options Xare described later. X X.H 3 "Preprocessor" X.P XThe \fBncc\fP compiler passes your input program through the X``C preprocessor.'' This is the same macro language and conditional Xcompilation facility that C uses. CPP \fIdirectives\fP are all keyed by lines Xthat begin with a ``#'' character. These include the X\fB#include "filename"\fP directive, which causes the contents of the named Xfile to be inserted into the compilation stream. X.P XIf you have a lot of little filtering routines for each newsgroup that Xyou put in individual files, you can get them all combined together Xwhen you compile with \fB#include\fP directives. Your big \fBswitch\fP Xstatement might look like: X.Bb X for( n in newsgroups ) switch( n ) { X#include "news/admin/kill.nc" X#include "news/groups/kill.nc" X#include "sci/physics/kill.nc" X#include "comp/sys/ibm/pc/kill.nc" X#include "rec/humor/kill.nc" X#include "rec/humor/funny/kill.nc" X } X.Be X.P XYou could then edit each file individually, as desired. X.P XOther directives include \fB#define\fP, which defines manifests constants Xand macros, and \fB#ifdef\fP/\fB#else\fP/\fB#endif\fP which allow Xconditional compilation based on whether or not a symbol has been defined Xwith \fB#define\fP or a command line options. X.P XA full exploration of CPP is beyond the scope of this manual. See Xdocumentation on the C language, as well as the ``man'' entry for XCPP in your own system's documentation. X.H 3 "Options" X.P XYou can control the compiling process to some degree by providing options Xto the compiler. X.P XThe compiler's primary argument is the sole input source file, which by Xconvention should end with the ``.nc'' (for NewsClip) extension. X.P XUntagged arguments with an extension of ``.c,'' ``.o'' or ``.a'' will not be Xtreated as NewsClip source programs, but rather as C source code, system Xobject code or library files. XThey will be passed directly to the C compiler to be linked in with your Xprogram. X.P XThe other options use LGS's own option style, which is a variant of the Xconventional Unix option style. Binary (on/off) options are preceded by Xa plus ``\fB+\fP'' or minus ``\fB-\fP,'' where plus means the option Xis turned on, and minus means the option is turned off. You can type Xa whole option name after the ``+/-,'' or just enough to uniquely Xdistinguish the option -- usually just a single letter. Thus X\fI-link\fP works as well as \fI-l\fP. X.P XValued options are written with a keyword (or perhaps the single letter Xabbreviation of the keyword), an equals sign ``\fB=\fP'' and a string Xvalue. For example, \fIo=myclip\fP. X X.H 4 "-link" X.P XThe \fI-link\fP option disables the C compile and link phase of compiling. XNo executable program will be produced. A C program with the same name Xas your source file (but with an extension of ``.c'') will be produced, Xassuming there are no errors. X X.H 4 "output=pathname" X.P XThis option specifies a name for the executable news Xfiltering program. The default is \fBnclip\fP. X X.H 4 "Define=defstring" X.P XThis option specifies a preprocessor definition to be Xpassed along to the C preprocessor. For example, \fID=bsd\fP would Xcause the manifest symbol ``bsd'' to be defined in \fB#ifdef\fP tests. XYou can specify several of these. X X.H 4 "Include=dirpathname" X.P XThis specifies a directory that the preprocessor Xshould search for files included with the \fB#include\fP directive. XYou can specify several of these. X X.H 4 "intermediate=file.c" X.P XThis allows you to specify an alternate Xintermediate name for the generated C program. Normally this name will Xbe derived from the name of the source file. The provided name must end Xwith ``.c.'' X X.H 4 "ccoption=option" X.P XThis lets you specify a string that Xis to be passed directly along to the C compiler for the compile and Xlink phase. You can pass any special local options your C compiler Xneeds. X X.H 4 "-externals" X.P XThe \fI-externals\fP option disables the ability of users to make Xexternal import declarations of symbols other than those in the Xapproved list of the NewsClip language. This limits the language Xto the definition in this manual. X.P XThis is only a very mild security feature, and any capable malicious Xprogrammer could get around it fairly easily. If you are going to Xallow remote sites to submit newsclip feeding programs to you, it is Ximportant that you create independent system userids for these programs, Xand run them with the real and effective userid properly set. Do Xnot use the ``uucp'' or any other system userid. XDepend on operating system tools for all your security, not this option. X X.H 3 "Single-User" X.P XIf you only have a single user copy of NewsClip, and, because you Xare not a system administrator, you have been unable to install XNewsClip files in system directories, then the files \fBcliblib.a\fP Xand \fBucode.h\fP must be in your current directory when you compile. X X.H 2 "Externals" X.P XSo long as the \fI-external\fP compiling option is not used, NewsClip Xprograms may make external declarations for arbitrary C routines. This Xincludes routines from the standard C library, or routines from Xspecial C source or object code modules provided on the \fBncc\fP Xcommand line. X.P XFor users willing to write their own C code, the potential here is Xtruly unlimited. The NewsClip language has been designed to be Xsimple and special purpose. There are some less common things that Xare simply not easy to do within it. External functions can do all Xthis for you. X.P XEven if you have source code to the NewsClip compiler, we advise you Xto do any special tricks with your own C code, rather than by changing Xthe compiler to extend the language. Neither route is officially Xsupported, but the former is preferred. X.P XImportant note: Since the case of letters in NewsClip doesn't matter, Xall C externals must be entirely in lower case. If you want to call Xan existing routine that has upper case letters in its name, you will Xhave to write a small interface routine to do the calling. With variables Xthat have upper case names, you will be out of luck. X X.H 2 "Filtering" X.P XOnce you have compiled your program, there are several ways you can Xrun it to filter news articles. We'll assume your program is in X\fBnclip\fP for now. First of all, \fBnclip\fP has a number of Xcommand line options you can use to control its operation. X.P XMost important are the ``modes'' of operation, specified with the X\fImode=\fP option. Essentially, you have written a subroutine which, Xwhen passed an article, decides whether to accept or reject that article. XThe control portion of the \fBnclip\fP program sets up how the articles Xwill be gathered and submitted to your procedure, and what will be done Xwith the results. X.P XYou are already familiar with \fInewsrc\fP mode, which you get by Xusing the \fImode=newsrc\fP option. We will explain it in more Xdetail here. X X.H 3 "Newsrc Mode (mode=newsrc)" X.P XIn \fInewsrc\fP mode, the \fBnclip\fP program processes a standard Xformat \fB.newsrc\fP file. Most newsreaders keep track of what the Xuser has read with a file of this name in the home directory. The XRN newsreader also keeps other files in the same directory as this file. X.P XIn \fInewsrc\fP mode, \fBnclip\fP also keeps a file Xcalled \fB.newsrclas\fP to keep track of the last article that has been Xprocessed by the \fBnclip\fP program in each desired newsgroup. This Xis necessary because it's not possible to tell where to start processing Xjust from the \fB.newsrc\fP file and the news \fBactive\fP file. X.P XWhen run in \fInewsrc\fP mode, \fBnclip\fP examines the \fB.newsrc\fP Xfile, \fB.newsrclas\fP file and the USENET active file X(usually \fB/usr/lib/news/active\fP). From these it calculates the Xrange of unread articles that must be processed. X.P XFirst it calls your \fBinit\fP procedure. X.P XIt then loops through the subscribed newsgroups in the \fB.newsrc\fP Xfile. As it starts each group, it calls your \fBstartgroup\fP procedure. XIt then goes through all the appropriate articles, and calls your X\fBarticle\fP procedure on each one. Each rejected article is marked Xas read. When the group is done, the \fBendgroup\fP procedure is called. X.P XWhen all is done, the \fBterminate\fP procedure is called, and the X\fB.newsrc\fP file is written out, with all the rejected articles marked Xas read. The \fB.newsrclas\fP file is written out with all articles Xmarked as processed. (This way, if you call \fBnclip\fP again immediately, Xit will do nothing unless new articles have arrived on your machine.) X.P XSome options and environment variables affect this procedure. See below. X.H 3 "Filter Mode (mode=filter)" X.P XThis mode works quite differently, and does not even involve the X\fB.newsrc\fP or \fBactive\fP files. Instead, it expects a list of Xfilenames to appear on the standard input. Each file should be a XUSENET article file. Each such article will be passed to your X\fBarticle\fP procedure. If the article is accepted, its filename Xwill be written to the standard output. If the article is rejected, Xnothing is written. X.P XThe result is a filtered list of accepted filenames. This is ideal Xfor controlling a batched feed to another site. Many news systems run Xby having the news processing programs output a list of article files Xto a special file. Periodic programs examine this file and batch together Xthe articles found in it. X.P XSimply modify your batching procedure to have the file processed by X.Bb Xnclip /tmp/me/.newsrc Xsrch m=n n=/tmp/me/.newsrc Xsetenv DOTDIR /tmp/me Xrn Xsetenv DOTDIR $HOME X.Be X X X X X.H 1 "Tips and Traps" X.P XIn this chapter, we remind you of some important things to remember when Xcoding your NewsClip programs. In particular, important differences from XC are pointed out. X X.H 2 "Memory" X.P XDon't create any loops that keep allocating strings -- for example with X\fBconcat\fP. Temporary memory is just allocated in a big stack, and Xit is never freed up until an article is done. A loop could easily make Xyou run out of memory, aborting your session. X.P XNaturally, be equally careful of permanent memory that you allocate in Xdatabases and permanent strings. Be sure to free all databases that Xyou are not using. (This is not necessary within the \fBterminate\fP Xprocedure.) X.P XRemember, when you read a database in from a file, you still get a database Xthat uses some memory, even if the file is missing or empty. X X.H 2 "Strings" X.P XMake sure all your search strings are in lower case letters, unless you Xknow you are searching a text field or string that has not been converted Xto lower case. Normally almost all such things are pre-converted to Xlower case, so if you put upper case in your patterns or test strings Xyou will not get a match. X X.H 2 "Integers" X.P XIf your machine only supports 16 bit integers, you can only place values Xfrom -32768 to 32767 in your integers. It is very easy to overflow. XIn fact, in some newsgroups, the article numbers may already overflow Xyour integers. X.P XOne place to watch out is the running \fIscore\fP that you modify with Xthe \fBadjust\fP statement. If you adjust the score beyond the range Xof an integer, it could wrap around, causing exactly the wrong result. X.P XMake sure your adjustments are appropriate, and not so large that they Xmight overflow if they all go the same way. If you are worried that Xyou might reach overflow at a given point, import the \fBscore\fP variable Xand put the following statement in at various points in your procedure. X.Bb Xif( score > 25000 || score < -25000 ) X return; X.Be XThis will stop the process if the score gets ridiculously high or low. X.P XSome of the functions returning large things like Xarticle sizes will compensate for small integers by returning the Xlargest integer (ie. 32767) when the actual result is out of bounds. XYou may wish to watch for this if you were counting on an exact result. X.P XDate/time variables will always be able to hold more than a 16 bit integer, Xbut their use as anything but date values is discouraged. X X.H 2 "Nil Headers" X.P XIf you use any array, userid or string header variables that are not Xguaranteed to be in an article, then you should always check to make Xsure the variables don't have a nil value before you use one. If Xyou assign into some index of a nil array, you could get into real Xtrouble. X.P XUsually you do this with a short circuit \fB&&\fP operator, as in: X.Bb Xif( keywords != nilarray && "rot13" in keywords ) X reject; X.Be X.P XWith integer and date variables, you will only get a zero value, so it Xmay not be absolutely necessary to check, but it is still always a good Xidea. X X X.B "Important Note:" X.P XRemember this: A nil array is not the same as an empty array. A nil string Xis not an empty string (\fB""\fP). If you use variables that might Xbe nil, beware. X.P XIn general, it's a good idea to use variables that can't be nil, Xsuch as \fBrdistribution\fP. You can also make your own functions to Xdo certain tasks for you. For example: X.Bb Xstring Xsafestring( string s ) X{ X return s == nilstring ? "" : s; X} X.Be Xcould be applied so that nil strings always become empty strings. You Xcould also define this as a CPP macro, or just use the \fB?\fP query Xoperator wherever necessary. X.P XThe nil values are important, as they let you test if a header field was Xpresent in the article at all. In some cases, such as the \fBApproved:\fP Xheader, the important thing is that the header is present. Currently, at Xleast, it doesn't matter what's in it. X.H 2 "Cross Posting" X.P XIf you are writing a program to be used in \fIbatch\fP mode, be sure to Xinclude the declaration: X.Bb Xextern string array xref; X.Be Xsomewhere in your program, or use some other system to avoid duplicate Xarticles. X.P XYou may want to do this \fBextern\fP even in \fInewsrc\fP mode, to Xsimplify processing. It does nothing in the processing modes that don't Xwork with a \fB.newsrc\fP file, like the \fIpipe\fP and \fIfilter\fP modes. X.P XAnother way to eliminate crossposts is to reject all articles where the Xfirst newsgroup in the \fBnewsgroups\fP array is not the current newsgroup, Xso long as that first newsgroup is a \fBnewsrc\_group\fP. (If it isn't Xyou will want to key on the first newsgroup in the array that is found Xin the \fB.newsrc\fP.) X X.H 2 "Speed" X.P XDon't import externals that you don't need. Sometimes just importing Xan external variable requests pre-processing that takes time. XThis applies to Xall the header variables, along with \fBdistribution\_level\fP and some Xof the statistical variables. X.P XBe conservative with your use of references to segments of the article Xbody. This can involve lots of disk I/O if you have lots of articles Xto scan. We advise that you keep body scans to your newsgroup specific Xcode. If you have a body scan for every article, you can expect the Xprogram to take a lot more time. Of course, NewsClip is quite fast, Xso this may be acceptable, particularly if it saves \fIyou\fP time. X.P XTry to use the variables like \fBlines\fP and \fBarticle\_bytes\fP Xthat don't usually require the reading of the whole article. Note Xthat \fBarticle\_bytes\fP sometimes does have to read the whole article Xwhen you are running in pipe mode on a system that doesn't have the Xnews article files. X.P XIn general, your code is getting compiled to C, and thus directly to Xmachine code. Don't be afraid of loops and integer operations in your code. XThey should go quite quickly. X.P XOptimize where you can with the use of the \fI+only\fP option or the X\fBreject\_all\fP and \fBaccept\_all\fP variables. Try the \fBnamed\_group\fP Xtrick described in the chapter on general technique. X.P XStick to simple patterns where possible -- they search faster. Also, Xuse constant patterns where you can. When your NewsClip program is Xrun, your constant patterns (quoted strings to the right of a \fBhas\fP Xoperator) get converted into the internal regular expression language only once, Xinstead of each time a search is done. X.P XIn particular, the or-bar (\fB|\fP) regular expression feature is not very Xefficient. It can often be significantly faster to code: X.Bb Xbody has "foo" || body has "bar" || body has "abc.*def" X.Be Xthan X.Bb Xbody has "foo|bar|abc.*def" X.Be Xparticularly if you put the most likely patterns first. X X.H 2 "Patterns" X.P XDo be sure to watch out for the regular expression ``metacharacters.'' XThese are ``\fB^$.[]()+?|\\*\fP''. If you're an \fBed\fP or X\fBgrep\fP user, this will be second nature to you, although you Xshould still watch out for the extra \fBegrep\fP characters, particularly Xthe parentheses, plus, question mark and or-bar. X.P XIf you wish to store a literal string in an array or database for later Xuse in searching, you may wish to apply the string function X\fBliteral\_pattern\fP to it. This is always wise if you're taking Xsomething like a subject line, which could contain all sorts of Xcharacters. X X.H 2 "Databases" X.P XIf you regularly search for a string array in a database, such as the Xpopular search for \fBreferences\fP in a database of bad message-ids, then Xonly the first entry found will get its ``access time'' updated. If the Xwhole \fBreferences\fP array is found in the database, only the first Xwill get marked as accessed. X.P XThis means that the later IDs will eventually fade away from the database. XThis should not present a problem, since they will all be children of Xthe parent ID in normal circumstances. X.P XIf this could cause a problem, you will have to write your own \fBin\fP Xfunction, which performs a loop, and doesn't stop after an entry is found. XThis will update all entries, but it might take a bit longer. X X.H 2 "Working With Newsreaders" X.P XSome newsreaders, like RN, have a powerful macro language. You will find Xthat it is possible in RN to define macros that will do automatic updates Xof your databases of bad messages, bad users, good or bad subjects or Xwhatever you please. If you build your NewsClip program from a Xseries of \fB#include\fPd group files, you can even set up macros to Xdo automatic edits of those files when desired, and then recompile the Xwhole thing with a \fBMake\fP file. See the RN manual for details. X.P XYou can also issue commands directly to your NewsClip program Xdirectly from a modified reader like RN. See our special appendix on Xthat topic. X X.H 2 "Kill Files" X.P XExactly duplicating the kill file interface of RN is not simple, although Xit can be done. The interface in NewsClip is of course, much more Xflexible. RN's kill files can issue commands on articles that match Xheaders in the subject line, entire header and body. It's Xeasy to do pattern searches in the subject or article body with NewsClip. XYou can't search the entire header, but the RN header search was only Xprovided to simplify the KILL file interface. X.P XIf you want something that's like a kill file, just read a local KILL Xdatabase for your newsgroup and say: X.Bb Xreject if subject has killdb; X.Be Xor X.Bb Xreject if body has killdb; X.Be XIf you want to keep it all in one database, you could read in the Xdatabase, and then do a loop splitting the database into a bunch of Xdifferent arrays or databases of patterns, using the integer key values. X X.H 2 "Variant Parsing" X.P XYou may not wish to have your header lines handled the same way in Xevery newsgroup. For example, in one newsgroup you might wish the X\fBkeywords\fP line to be delimited with spaces, and in another you Xmight wish commas. (Normally it uses commas.) X.P XYou can't do that with the normal header variable declaration system, Xas the parsing of the header variables is done before you get to process Xthe article yourself. X.P XThe solution is to define your header variables as simple strings, as in: X.Bb Xheader string keywords : "keywords"; X.Be Xand then parse the string yourself. For example: X.Bb Xstring array keys; Xswitch( main_newsgroup ) { X case #rec.humor.funny: X parse keys = keywords, "S,"; X accept if laugh in keys; X break; X default: X parse keys = "keywords, " "; X if( keys has "^foo" ) X adjust 20; X break; X } X.Be X X.H 2 "Feeding Sites" X.P XIf you use NewsClip's \fIbatch\fP mode to feed other sites (or users) Xfrom a \fB.newsrc\fP file, you must be sure to include the group X``control'' in the list of subscribed groups. This will pass control Xmessages (cancellations of articles etc.) to your feed site. X.P XWhile it should usually do little harm to pass all control messages, you Xmay wish to filter them further. The ``control'' group is unusual, in Xthat the groups on the \fBNewsgroups:\fP line will not include \fBcontrol\fP, Xbut will rather be the groups to which the control message applies. X.P XYou may wish to forward control messages only if they include a group you Xalready subscribe to. The \fBnewsrc\_group\fP function tells you if a group Xwas one of those listed in the \fB.newsrc\fP file. You may also wish Xto include hierarchies of control messages to catch new group creation Xmessages. You may wish to filter out boring ``ihave/sendme'' protocol Xcontrol messages by looking at the control line. X.P XNewsgroup creation messages get posted to the special pseduo-group, X``\fIgroupname\fP.ctl.'' Thus the creation message for ``comp.misc'' Xwas ``posted'' to ``comp.misc.ctl'' -- watch for that. Special control Xmessages may also be posted to fake groups that end in ``.ctl.'' This Xmeans you may wish to use pattern matching on your newsgroup names instead Xof the usual exact match schemes. X.P XIf you catch a creation message that you want to propagate, you may also Xwish to add the created group to your \fB.newsrc\fP file. Use the X\fBsubscribe\fP procedure to do this. X.P XFeeding with a \fB.newsrc\fP has some powerful advantages. For example, Xit's easy to have a complex subscription list. You can even combine together Xall the \fB.newsrc\fP files from the remote site, add ``control'' and build Xa file that only sends what is actually read. X X.H 2 "Examples" X.P XHere are some examples of how to code for common actions. Some of these Xexamples are conditional expressions, which you can then use in \fBif\fP, X\fBreject if\fP or \fBaccept if\fP statements, as desired. In most Xcases, these examples are code fragments, and not complete programs. It Xis assumed that they exist within larger programs. (For example it's Xpointless to have a program that just does \fBaccept if\fP, as \fBaccept\fP Xis the default action. X X.H 3 "My Own Articles" X.P XTo see your own articles and all followups to them: X.Bb Xdatabase myarticles; Xextern string message\_id; Xextern userid from; Xextern string array references; Xprocedure init() X{ X myarticles = read\_database( "~./News/myarts" ); X} Xprocedure article() X{ X extern string my\_mail\_address; X if( from == my\_mail\_address ) { X myarticles[message\_id] = true; X accept; X } X if( references != nilarray && references in myarticles ) X accept; X /* more code */ X} Xprocedure terminate() X{ X extern datetime time\_now; X write\_database( myarticles, "~./News/myarts", time\_now - month ); X} X.Be X X.H 3 "Local Articles" X.P XShow me articles by people from my site: X.Bb Xextern userid from; X{procgap} Xextern string my\_domain; Xextern string domain( string ); Xaccept if domain( from ) == my\_domain; X.Be X.H 3 "Locally Distributed Articles" X.P XShow me articles posted for citywide distribution or smaller: X.Bb Xextern int distribution\_level; Xextern int dlevel( newsgroup ); Xaccept if distribution\_level <= dlevel(#city); X.Be X.P XYou may want to filter by distribution based on the group. In some groups Xyou might want to read the whole netwide stream, and in others you might Xwant to read only the local stream. In some groups, you might even want to Xeliminate the local stream. X.H 3 "Crossposting" X.P XAn article might be considered too heavily crossposted if X\fBcount(newsgroups) > 4\fP. On the other hand, you might decide in Xsome groups to only read articles unique to the group with: X.Bb Xcase #news.admin: X reject if count(newsgroups) > 1; X break; X.Be X.P XYou might want to be a bit more lenient than that. The following code: X.Bb Xextern newsgroup main_newsgroup; Xreject if main_newsgroup != newsgroups[0]; X.Be Xrejects articles where the primary newsgroup isn't the one you Xare currently processing. This means messages that were posted to your Xgroup as a possible afterthought. You might wish to give them a lower Xscore or reject them out of hand. Of course, if you do subscribe to Xthe primary newsgroup (first on the \fBnewsgroups\fP list), then you Xwill still see the article in that group. If you don't subscribe, you Xwon't see it at all. X.H 3 "Eliminating a User" X.P XYou can eliminate a list of users from ``your'' net, so that you don't Xsee their articles, and you don't even see followups to their articles. X.Bb Xdatabase badusers; Xdatabaes badarticles; Xextern string message\_id; Xextern userid from; Xextern string array references; Xprocedure init() X{ X badusers = read\_database( "~./News/badusers" ); X badarticles = read\_database( "~./News/badarts" ); X} Xprocedure article() X{ X /* does it come from a nasty user? Mark it */ X if( from in badusers ) { X badarticles[message\_id] = true; X reject; X } X reject if references != nilarray && references in badarticles; X /* more code */ X} Xprocedure terminate() X{ X extern datetime time\_now; X write\_database( badarticles, "~./News/badarts", time\_now - month ); X} X.Be X.H 4 "\fIReally\fP Eliminating a User" X.P XThere are still many sites out there that don't build proper X\fBreferences\fP chains on their articles. To really eliminate followups Xto an article, you have to do more than add the message id to a database of Xbad messages. If the article is an original, with no ``Re:'' at the Xfront of the subject, you should also add the subject line to a Xdatabase of bad subjects. X.P XAnd if you want to get really fancy, you could have your program search Xarticle bodies for mentions of the user's name. X.H 4 "If you Eliminate a User" X.P XIf you decide that you would be better of eliminating the postings of Xa USENET user, it would be a good idea to send a brief mail note to this Xuser indicating that you have done so, possibly including the reason Xwhy. X.P XSome users who make annoying mistakes on USENET may not realize that Xthey are making mistakes, or they may not realize the extent to which Xthey are annoying people. If they are informed that some readers have Xdecided to read no more of their writing, they may decide to change Xtheir behavior. That is up to the poster, of course. X.H 3 "Included Text & Signatures" X.P XYou may not like long rebuttal articles with lots of included text. XIn some groups, you could then include: X.Bb Xextern int lines; X{procgap} Xreject if lines > 50 && lines / line\_count( included ) < 2; X.Be Xwhich rejects long articles that are more than half included text. X.P XYou could also reject (or lower the score) of articles that are short Xand have big signatures. X.Bb Xextern int lines; X{procgap} Xreject if lines < 30 && line\_count( signature ) > 9 X.Be XTo get fancy, you could have an \fBif\fP statement add the posters of Xsuch articles to your \fBbadusers\fP database (see above) so that you Xnever hear from them again! In this case you would have to write out Xyour \fBbadusers\fP database at the end of the session. X.H 3 "Followups" X.P XIn some groups, it's better to just ignore the followups. Try X.Bb Xextern int followup; X/* big group switch */ Xcase #rec.humor: X reject if followup; X break; X.Be XYou might not be so harsh, but instead just lower the score or apply Xfurther tests before allowing followups to make it through. X.P XAnother idea is to ignore followups except in the main group on the Xnewsgroup list. Try this: X.Bb Xextern int followup; Xextern newsgroup main\_newsgroup; Xreject if followup && main\_newsgroup != newsgroups[0]; X.Be X.H 3 "Two Out of Three Ain't Bad" X.P XYou can use integer arithmetic in combination with the fact that Xconditional expressions return 1 for true and 0 for false. To accept Xan article that has 2 out of 3 keywords in the subject: X.Bb Xextern string subject; X{procgap} Xaccept if (subject has "baz") + (subject has "bar") + (subject has "foo") > 1; X.Be X.H 3 "Patterns of Groups" X.P XYou can get pretty fancy with what you do with crossposted articles. In Xfact, with the right use of NewsClip, crossposting could be a good Xthing. Say you want to only see space articles that also pertain to Xastronomy. You could either use \fBis sci.space && is sci.astro\fP in Xa general expression, or if you use a \fBswitch\fP, you could say: X.Bb Xcase #sci.astro: X reject if !is sci.space; X.Be XLikewise you could say: X.Bb Xcase #rec.humor: X reject if is talk.bizzare; X.Be Xto eliminate only the messages crossposted to that other group. No Xdoubt \fBreject if is comp.sys.atari.st && is comp.sys.amiga\fP will Xbe popular! Likewise, if people are kind enough to crosspost to X``alt.flame'', that lets you control whether you read the article or not. X.P XUse boolean logic on groups to your heart's content. X X X X.H 1 "Debug & Testing" X.P XAll programs of any complexity will have bugs, and yours will be Xno exception. Your bugs may simply cause articles to be accepted or Xrejected improperly, or they may cause your filter program to crash, Xeither through an infinite loop or an exception. X.H 2 "Segmentation Fault" X.P XThe most frustrating thing to see can be the message ``segmentation fault.'' X(Sometimes ``memory fault.'') XThis means, on Unix, that your program has tried to use memory Ximproperly. This is often the result of an attempt to reference an Xarray, string or userid that has a \fBnil\fP value. X.P XYou must remember that before you ever reference data in an array or Xstring that might not be defined, you must check that it is defined. X.P XThere is a difference between \fBnilstring\fP and the empty string X(\fB""\fP). For example, if you use the \fBsummary\fP header variable, Xit will be \fBnilstring\fP if the header wasn't there, and \fB""\fP if Xthe header was there, but the summary was blank. X.P XThe same is true for nil arrays. \fBnilarray\fP isn't the same as an Xarray with no elements. For your protection, the current release of XNewsClip has the \fBin\fP and \fBhas\fP operators treat \fBnillarray\fP Xas an empty array, but this is not guaranteed to work in future releases. X.P XWe do allow a nil database to be the same as an empty database when it Xcomes to looking in the database, but you can't use a nil database for Xstoring into -- you could get that ``segmentation fault.'' X.P XOther causes of this error include: array indices that are out of bounds, Xor a character index beyond the end of a string. X.P XAlways beware of the most common cause, which is the use of a variable Xthat has not yet been assigned a value. X X.H 2 "Debuggers" X.P XIf you can't figure out the immediate cause of a problem like this, and Xyou are a C programmer, Unix has many debugging tools available to help Xwith this sort of problem. X.P XThe C source produced by \fBncc\fP is fairly readable, and you should Xbe able to readily tell what line of the output C program corresponds Xto a statement in your NewsClip program. Use the \fI-l\fP option Xof \fBncc\fP to generate a standalone C program. You can then Xcompile and link it with the \fBnewsclip.a\fP library yourself, using Xwhatever debug options you desire. X X.H 2 "Dprintf" X.P XNewsClip contains a special procedure called \fBdprintf\fP. This acts Xjust like the \fBprintf\fP function from C, except it prints to the Xstandard error output. It takes a variable number of arguments, from X1 to 5. These can be strings, ints or dates. See the man page for X\fBprintf\fP for full details. X.P XInsert debugging print statements in your programs so you can figure out Xwhat's going on and what values are being assigned to variables. X.P XPlease note that you can't print variables of type \fBnewsgroup\fP Xor \fBuserid\fP. Assign such values to strings first. Alternatively, you Xcan print newsgroups with the ``%d'' code, which will give the newsgroup Xnumber. X X.H 2 "Warning Level" X.P XYou can set the warning level for your NewsClip programs with the X\fIwarning=num\fP option. Provide a number. The higher the number, Xthe more warnings you get. The default level is 1, and currently Xwarnings exist at levels 0 through 4. Select a high number like 100 Xto get all warnings. X.P XYou will be warned about conditions that are normally considered OK, Xsuch as the reading of a non-existent database file, but you may also Xlearn some useful debugging information. X X.H 2 "Trial Runs" X.P XTo test and debug your programs, use the \fIfilter\fP or \fIlist\fP modes Xof operation. We suggest \fIfilter\fP for preliminary testing. X.P XTo do this, prepare a list of article filenames, either with articles Xmade up by you or live articles on your system. Use absolute pathnames Xif possible. Start perhaps with only one article in the list. Run: X.Bb Xnclip m=filter