Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!ox.com!ox.com!emv From: emv@ox.com (Ed Vielmetti) Newsgroups: comp.protocols.tcp-ip Subject: Re: NetFind and its Internet load Message-ID: Date: 7 May 91 07:40:39 GMT References: <1991May2.180737.29852@csn.org> <1991May6.173923.174@colorado.edu> Sender: usenet@ox.com (Usenet News Administrator) Organization: OTA Limited Partnership, Ann Arbor MI. Lines: 150 In-Reply-To: schwartz@latour.colorado.edu's message of 6 May 91 17:39:23 GMT In article <1991May6.173923.174@colorado.edu> schwartz@latour.colorado.edu (Mike Schwartz) writes: > My major problem with tools like NetFind is that although they address > the "resource discovery" problem for a single user, they don't have any > positive side-effects for the rest of the internet. Maybe this is too simplistic of an interpretation, but it seems your argument boils down to the fact that NetFind is basically a client of existing services, rather than a new service in its own right (like X.500). It may be just a matter of terminology; if NetFind were billed as just a souped-up version of finger, then it could be evaluated in the context of being basically a client of other services. But with the claims of it being a "Semantically Cognizant Internet White Pages Directory Tool" with the ability to reach "1,147,000 users in 1,929 administrative domains", when it's mentioned in the same breath as X.500 projects and as an alternative to them, something about it calls for a more critical examination. Just to qualify the numbers -- 1,147,000 reachable users is 1,929 reachable domains, each with an average of 119 hosts (mean based on sample size of 75), with each of those hosts containing a "conservative estimate" of 5 users per host. I don't see any breakdown of success rate by type of domain; notably, the only success numbers I could find (80+% hit rate by day, 70+% by night) don't attempt to measure success to the 40% of the database that's not in the USA. Perhaps there are a million people out there; I'm not convinced of how many people you can find. The performance figures didn't correct for sample bias in the observer; it would be expected that the author would look for people in a field related to his (computer science). Since computer science departments are often those in charge of running the name servers on campus, the particular happy accident of the search algorithm relying on SMTP lookups to the primary name servers may work overly well for CS dept. searches. It is less likely to work well for lookups on people who are more peripheral to the campus network infrastructure. An interesting exercise would be to run NetFind against the names of 10 senior librarians, 10 junior physics faculty members, 10 mathematics graduate students, and 10 undergraduate French majors, suitably scattered about; I have some guesses as to how well your results will turn out. (In truth none the numbers tossed around in the paper are especially convincing; it would have been appropriate to qualify estimated packet counts and user counts with estimated error ranges. It's not possible for me to justify 1,147,000 users any more than 1,146,000 users; a more plausible figure is "on the order of a million users". That's especially true without a good rationale for picking 5 users per host, a figure which appears out of the blue with absolutely no references....) I note that your paper shows (fig 3) that usage of your NetFind prototype tapers off to an average of one use every two weeks. There is no indication from the study whether usage dropped so sharply from the original high average of 7 uses in the first day, or why it drops so far below the estimated 10 searches per week (quoted from RFC 1107). Given the expectation of relatively static communities of interest and the ready availablity of e-mail address information of potential colleagues by alternative access methods (business cards, telephone calls, private mailing lists, netnews) it's not surprising to me that the need for zero-prior-knowlege user lookup information is lower than 1/day. But given that the usage trails off to almost nil after 200 days of use, it would seem to call into question the long-term usability of your product. Have you done any retrospective work on determining why user usage levels dropped to such low levels? ... I have found that if someone is "reachable" by NetFind, it is usually pretty easy to find them with NetFind. That's hard to argue with. But it doesn't yield any insight into what makes people hard to locate, or how to design campus and corporate information systems so that people can easily be found without resorting to extraordinary sleuthing measures. You casually write off (in section 5, Related Work) the efforts of campuses to provide local X.500 services which are accessable via finger; though it's not directly germane to your research, it would have at least been useful to point out that X.500 servers can be deployed within the existing system to good effect. Stick an X.500 system at yourdomain.org with a big pile of user names in it, make it so finger@yourdomain.org does the right thing, and for large institutions like UIUC, UMich, MIT you have a larger problem solved than trying to chase pointers through a domain hierarchy. Granted, the information is somewhat more stale and less likely to be exactly true; but I think it's arguable that zero-knowlege searches are looking more for a pointer to information than an exact match. (e.g. finger vielmetti@umich.edu and you'll get something, but you might have to chase it down a bit to find out from a human that I've moved recently.) As an aside, searching for more general types of resources (like anonymous FTP files) is a harder problem, and the architecture I use for that project does utilize the results of previous users' searches in facilitating future users' searches. Yes, I've read the paper; can't say that it compares with a service like "archie", though, even if the software were available. My reactions to that paper can wait for another message. I'm not impressed with the amount of effort you've spent on seeing how people have really addressed the problem; in particular, your success rates for scanning the net for interesting information are skewed because i'm doing it for you already.... I think your objection to a tool only helping one user at the time of use, without contributing to other users by its specific use, is really wrong. If this were the standard against which all software was compared, we would get rid of most of the software in the world. - Mike Schwartz Dept. of Computer Science Univ. of Colorado - Boulder I think my point is valid. For me to want to let you accomplish a particular task on the Internet (a shared, finite resource) you need to justify to me that it's worth it to let you interpose your packets in the way of my packets on the way to their destination. I will be unwilling to do this unless I'm generous or unless I can see some benefit (or very low cost) from you doing so. Remember that your use of the net is generally going to make my use of the net marginally slower, less convenient, and more risky, unlike e.g. your use of an editor on your local system. That's the story of negative externalities and the "tragedy of the commons"; everyone does a little thing that's convenient for them but which causes the playground to be littered. (e.g. the cutoff of nudie pictures on USA FTP sites causing the saturation of the USA-Finland internet link, and the subsequent barrage of traffic in alt.sex.pictures). Provide me with something useful, a scrap of code I can use or a good idea to work with, and I'll let you go about your business NetFind does seem to pose certain risks to the rest of the net; you could be very efficiently bombarding my slow links on a wild goose chase trying to find someone somewhere else. In truth, I'm sure that the tradeoff is positive, and that I would be quite happy if just one person somewhere used NetFind to find me. A more salient risk is that successful efforts like NetFind would lead people to believe that generating queryable information collections a la X.500 is not necessary in the long run and that we'd be content with ad hoc solutions. [ The paper I'm making references to is ftp'able as latour.colorado.edu:/pub/RD.Papers/White.Pages.ps.Z ] -- Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com "often those with the power to appoint will be on one side of a controversial issue and find it convenient to use their opponent's momentary stridency as a pretext to squelch them"