Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!ox.com!ox.com!emv
From: emv@ox.com (Ed Vielmetti)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: NetFind and its Internet load
Message-ID: <EMV.91May7034036@poe.aa.ox.com>
Date: 7 May 91 07:40:39 GMT
References: <1991May2.180737.29852@csn.org> <EMV.91May3032230@poe.aa.ox.com>
	<1991May6.173923.174@colorado.edu>
Sender: usenet@ox.com (Usenet News Administrator)
Organization: OTA Limited Partnership, Ann Arbor MI.
Lines: 150
In-Reply-To: schwartz@latour.colorado.edu's message of 6 May 91 17:39:23 GMT

In article <1991May6.173923.174@colorado.edu> schwartz@latour.colorado.edu (Mike Schwartz) writes:

   > My major problem with tools like NetFind is that although they address
   > the "resource discovery" problem for a single user, they don't have any
   > positive side-effects for the rest of the internet.  

   Maybe this is too simplistic of an interpretation, but it seems your
   argument boils down to the fact that NetFind is basically a client of
   existing services, rather than a new service in its own right (like
   X.500).  

It may be just a matter of terminology; if NetFind were billed as just
a souped-up version of finger, then it could be evaluated in the
context of being basically a client of other services.  But with the
claims of it being a "Semantically Cognizant Internet White Pages
Directory Tool" with the ability to reach "1,147,000 users in 1,929
administrative domains", when it's mentioned in the same breath as
X.500 projects and as an alternative to them, something about it calls
for a more critical examination.

Just to qualify the numbers -- 1,147,000 reachable users is 1,929
reachable domains, each with an average of 119 hosts (mean based on
sample size of 75), with each of those hosts containing a
"conservative estimate" of 5 users per host.  I don't see any
breakdown of success rate by type of domain; notably, the only success
numbers I could find (80+% hit rate by day, 70+% by night) don't
attempt to measure success to the 40% of the database that's not in
the USA.  Perhaps there are a million people out there; I'm not
convinced of how many people you can find.

The performance figures didn't correct for sample bias in the
observer; it would be expected that the author would look for people
in a field related to his (computer science).  Since computer science
departments are often those in charge of running the name servers on
campus, the particular happy accident of the search algorithm relying
on SMTP lookups to the primary name servers may work overly well for
CS dept.  searches.  It is less likely to work well for lookups on
people who are more peripheral to the campus network infrastructure.
An interesting exercise would be to run NetFind against the names of
10 senior librarians, 10 junior physics faculty members, 10
mathematics graduate students, and 10 undergraduate French majors,
suitably scattered about; I have some guesses as to how well your
results will turn out.

(In truth none the numbers tossed around in the paper are especially
convincing; it would have been appropriate to qualify estimated packet
counts and user counts with estimated error ranges.  It's not possible
for me to justify 1,147,000 users any more than 1,146,000 users; a
more plausible figure is "on the order of a million users". That's
especially true without a good rationale for picking 5 users per host,
a figure which appears out of the blue with absolutely no
references....)

I note that your paper shows (fig 3) that usage of your NetFind
prototype tapers off to an average of one use every two weeks.  There
is no indication from the study whether usage dropped so sharply from
the original high average of 7 uses in the first day, or why it drops
so far below the estimated 10 searches per week (quoted from RFC
1107).  Given the expectation of relatively static communities of
interest and the ready availablity of e-mail address information of
potential colleagues by alternative access methods (business cards,
telephone calls, private mailing lists, netnews) it's not surprising
to me that the need for zero-prior-knowlege user lookup information is
lower than 1/day.  But given that the usage trails off to almost nil
after 200 days of use, it would seem to call into question the
long-term usability of your product.  Have you done any retrospective
work on determining why user usage levels dropped to such low levels?

   ... I have found that if someone is "reachable" by NetFind, it is
   usually pretty easy to find them with NetFind.

That's hard to argue with.  But it doesn't yield any insight into what
makes people hard to locate, or how to design campus and corporate
information systems so that people can easily be found without
resorting to extraordinary sleuthing measures.  You casually write off
(in section 5, Related Work) the efforts of campuses to provide local
X.500 services which are accessable via finger; though it's not
directly germane to your research, it would have at least been useful
to point out that X.500 servers can be deployed within the existing
system to good effect.  Stick an X.500 system at yourdomain.org with
a big pile of user names in it, make it so finger@yourdomain.org does
the right thing, and for large institutions like UIUC, UMich, MIT you
have a larger problem solved than trying to chase pointers through a
domain hierarchy.  Granted, the information is somewhat more stale and
less likely to be exactly true; but I think it's arguable that
zero-knowlege searches are looking more for a pointer to information
than an exact match.  (e.g. finger vielmetti@umich.edu and you'll get
something, but you might have to chase it down a bit to find out from
a human that I've moved recently.)

   As an aside, searching for more general types of resources (like
   anonymous FTP files) is a harder problem, and the architecture I
   use for that project does utilize the results of previous users'
   searches in facilitating future users' searches.

Yes, I've read the paper; can't say that it compares with a service
like "archie", though, even if the software were available.  My
reactions to that paper can wait for another message. I'm not
impressed with the amount of effort you've spent on seeing how people
have really addressed the problem;  in particular, your success rates
for scanning the net for interesting information are skewed because
i'm doing it for you already....

   I think your objection to a tool only helping one user at the time of
   use, without contributing to other users by its specific use, is really
   wrong.  If this were the standard against which all software was
   compared, we would get rid of most of the software in the world.

    - Mike Schwartz
      Dept. of Computer Science
      Univ. of Colorado - Boulder

I think my point is valid.  For me to want to let you accomplish a
particular task on the Internet (a shared, finite resource) you need
to justify to me that it's worth it to let you interpose your packets
in the way of my packets on the way to their destination.  I will be
unwilling to do this unless I'm generous or unless I can see some
benefit (or very low cost) from you doing so.  Remember that your use
of the net is generally going to make my use of the net marginally
slower, less convenient, and more risky, unlike e.g. your use of an
editor on your local system.  That's the story of negative
externalities and the "tragedy of the commons"; everyone does a little
thing that's convenient for them but which causes the playground to be
littered.  (e.g. the cutoff of nudie pictures on USA FTP sites causing
the saturation of the USA-Finland internet link, and the subsequent
barrage of traffic in alt.sex.pictures).  Provide me with something
useful, a scrap of code I can use or a good idea to work with, and
I'll let you go about your business

NetFind does seem to pose certain risks to the rest of the net; you
could be very efficiently bombarding my slow links on a wild goose
chase trying to find someone somewhere else.  In truth, I'm sure that
the tradeoff is positive, and that I would be quite happy if just one
person somewhere used NetFind to find me.  A more salient risk is that
successful efforts like NetFind would lead people to believe that
generating queryable information collections a la X.500 is not
necessary in the long run and that we'd be content with ad hoc
solutions.

[
The paper I'm making references to is ftp'able as
	latour.colorado.edu:/pub/RD.Papers/White.Pages.ps.Z
]

-- 
Edward Vielmetti, vice president for research, MSEN Inc. 	emv@msen.com

"often those with the power to appoint will be on one side of a
controversial issue and find it convenient to use their opponent's
momentary stridency as a pretext to squelch them"