Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP Path: utzoo!linus!decvax!bellcore!vortex!lauren From: lauren@vortex.UUCP (Lauren Weinstein) Newsgroups: net.news Subject: Re: keyword-based news (final comments! really!) Message-ID: <839@vortex.UUCP> Date: Mon, 14-Oct-85 15:33:51 EDT Article-I.D.: vortex.839 Posted: Mon Oct 14 15:33:51 1985 Date-Received: Tue, 15-Oct-85 08:05:22 EDT References: <3253@nsc.UUCP> Organization: Vortex Technology, Los Angeles Lines: 182 The temptation to ignore Chuqui's last outburst is considerable. While I've certainly not hidden my feelings that keyword-based news schemes are not appropriate for Usenet, I can't recall ever saying that Chuqui or others shouldn't work on it if they so desire. Nor can I recall calling him or others "idiots," "pigs," or other similar terms that he found it necessary to use in what he himself called his "obligatory childish behavior." Upon reflection, I suspect that the biggest problem is that many persons are simply not familar with the work and problems already done in the areas of query/response and keyword systems. It isn't as if it's a new invention. Such systems have existed for quite a long time, and a considerable body of published work exists that clearly point out the positive and negative aspects of such systems. One previous poster on this topic mentioned some of the formal terms of these systems--I've steered clear of the formal terminology since I figured most people wouldn't be too interested, but perhaps some formal discussion of the theory and practice of these systems is in order at some point. In any case, let's briefly address a few issues from Chuqui's message: > My hope on going public with my NNTN project was to try to get some > reasonable feedback. As seems to be typical of most of the network, and of > Lauren in particular, all I've gotten are rather childish attempts at > minimizing any attempt to do something positive for this beast we > laughingly call a network. How many people remember some of the incredible pressure I was under when I first proposed Stargate? It makes the sorts of messages we've seen here on the topic of keywords seem like nothing by comparison. I never saw any message that said, "People who work on keyword systems are idiots." What I did see were messages (some of them written by me) that said, "Keyword systems (as described) won't work in the Usenet environment and may create far more problems than they would solve." Personal opinion to be sure. Not based on a desire to see Usenet die, but rather based on the desire to avoid seeing new problems created. > Well, there is an implied "Chuqui's a young whippersnapper, listen to me > because I've been solving these problems since he was in diapers" comment in > there. Well, I could make a snide comment about old and senile hackers, but > that doesn't contribute to the situation... I'll even agree that his > arguments aren't new. Old arguments aren't neccessarily right, they're just > old.... Then again, old arguments might be right, too. Ignoring history is often a serious mistake. When topics have been discussed in the past, and when a body of technical work concerning the topic of interest already exists, a great deal of time can be wasted if a person chooses to simply ignore all that has come previously. Whether this is done on purpose or through naivete doesn't much matter--the result is usually the same. An important issue revolves around how much time is spent "re-inventing" the wheel, only to come up against the same old problems, in such situations. Another issue concerns whether or not well-intentioned efforts that might have a short-term benefit create additional long-term problems. > Stargate... Stargate is designed to be a medium and long-term alternative for collecting and transmitting information. It isn't meant to provide exactly the same sorts of "services" we get now from Usenet. As an aside, the project is going quite well, and I hope to have some significant announcments regarding service organization and availability in the fairly near future. I hope to have more hardware available soon to allow more sites to receive Stargate transmissions--mass production of the decoders is already underway, and the prototype "buffer box" is under construction now. At the same time, various non-technical discussions relating to the evolution of the project from an experiment to a service continue. More in net.news.stargate as developments warrant. > I feel that "solutions" that DO include moderation may work on some nets, > but won't work on USENET. You want a different net, fine, but I want the > decision to read or not read an article in the hands of the reader. I > CERTAINLY wouldn't want a newsgroup moderated by Lauren, if only because he > and I disagree on everything and therefore the stuff I'd consider > interesting wouldn't get in.... What we need to do is build a system that > makes it easier to screen messages and less likely to mispost messages. This is a fine short-term concept. And keywords used in ADDITION to newsgroups might be of use in that area. My primary objection to keyword systems appears when people want to REPLACE newsgroups with keywords. In essence, newsgroups provide something that has been found to be critical in real-world keyword systems--keyword list control. That is, a newsgroup is, in essence, a base keyword that all users are required to choose from which provides a conceptual "anchor" for the message. If they want to add additional keywords also (as some people do now) that's OK... people with the appropriate software may choose to use or ignore those keywords (of course, they should keep in mind the keyword error modalities we've discussed previously when making such a decision). But without some sort of "forced keyword selection" (which is what newsgroups really are) we're faced with a serious problem. Sites are put at the mercy, when trying to decide how to spend their time and money, of the keyword choices of individual users. Variation of keywords has been shown to be one of the biggest problems with uncontrolled keyword-based systems. Not only does it make it difficult to find articles of interest, but it makes controlling what you DON'T want to see very difficult. There are just TOO MANY KEYWORD POSSIBILITIES in an uncontrolled system, even assuming that all users choose keywords conscientiously, accurately, and in detail. Enough on this point. I'll just add that virtually all "successful" keyword-based systems have a central authority that controls keyword use. Often this authority actually chooses the keywords, or else "corrects" poor user keyword choices before letting articles enter the database. Often official keyword lists are also published or made otherwise available so that users will see what sorts of words will appear and thusly allow intelligent usage of keywords on the part of both posters and readers of articles. Keyword control is CRITICAL. I can't emphasize this enough. I don't see any way to make this work in a distributed environment such as Usenet. Newsgroups are the underlying structure that holds the current net together to the extent that it is now. > The only person I really trust as moderator is me, and I wouldn't trust me > as moderator for anyone else... This would be OK, if there were no costs associated with traffic. Let's take an example. Let's say we had this "perfect" AI program that could precisely and accurately filter all our incoming traffic and show us ONLY what we TRULY wanted to see. It's never fooled--it never makes mistakes. EVERYONE is running it (so we don't have to worry about the poor slobs running older software who have no way to do automatic filtering!) Would this solve Usenet's problems? Of course not. The problem is that getting the traffic to the AI program costs money, time, and other (hardware) resources (disk, dialups, CPU cycles, etc.) What happens when we have 100,000 sites on the net? Can't happen? Well, probably not, for much the same reason that we're unlikely to reach a population of 100 billion on this planet--everything will collapse long before then. But traffic continues to grow, and the percentage of useful traffic, by virtually ANY reasonable definition, will continue to decline. When ANYBODY, ANYWHERE on the net, can post a message to EVERYONE, we start to look into the face of problems that will be nearly exponential in nature. To take an old example: Someone asks the network what "foo" means. What do we do when a few thousand people respond? Or more? Even given the fancy AI program that can show us only the "meaningful" responses--we've still paid to send all those answers (99.9% of which will impart no new useful information) throughout the world. As the net grows, this sort of behavior will simply become impossible to support, from neither a time nor resource standpoint. --- The course that Usenet is taking is becoming very clear. It isn't even necessarily a BAD course--but it's in keeping with evolution. What we're going to see is increased fragmentation. The recent announcement by utzoo regarding newsgroup cutoffs is an example of such fragmentation in action. As the volume of materials continues to grow more sites will be forced to make hard decisions about what they can afford, under various criteria, to support. To the extent that some sites feel wealthy enough to continue taking full traffic in an ever-expanding network with 10's of 1000's of sites, they will be free to do so. After all, any site can arrange to call any other site and pass whatever articles they wish. Other sites may wish to try alternatives (e.g. Stargate) which will offer what will hopefully be a far more cost effective and lower noise information flow. The model of rapid-turnaround "information conduits," with users submitting items for "publication through the conduits," is the one I like to use for Stargate. To the extent that people like or dislike the way these conduits are managed the service will evolve and change. Persons with the resources to support all or part of the free-for-all on Usenet can do so also, of course. Participating in Stargate doesn't require giving up everything else. It will always be up to the individual sites to make these decisions. Usenet won't just DIE. But its nature will gradually continue to change as more sites join the fray, and as the volume of postings continues to increase. The noise level WILL continue to rise in an unmoderated environment, and traffic will continue to grow rapidly to the extent that backbone cutoffs do not occur. Article filtering techniques may have some short term benefit--but only if they do not make it MORE difficult for sites and users to accurately control their traffic, costs, and time. Traffic growth, however, will prevent such techniques from being a long-term solution to what are really systemic problems in Usenet itself--problems that have appeared as netnews grew far beyond the size envisioned by those who created it. --Lauren--