Path: utzoo!attcan!uunet!lll-winken!ncis.llnl.gov!helios.ee.lbl.gov!nosc!ucsd!rutgers!elbereth.rutgers.edu!harnad
From: harnad@elbereth.rutgers.edu (Stevan Harnad)
Newsgroups: comp.ai
Subject: Re: Biological Categorization
Summary: On doing cognitive theory vs. doing ontology
Message-ID: <Jan.21.13.42.31.1989.10447@elbereth.rutgers.edu>
Date: 21 Jan 89 18:42:32 GMT
References: <681@cogsci.ucsd.EDU> <2959@uhccux.uhcc.hawaii.edu> <1007@husc6.harvard.edu>
Organization: Rutgers Univ., New Brunswick, N.J.
Lines: 188


reiter@endor.harvard.edu (Ehud Reiter) of Aiken Computation Lab
Harvard, Cambridge, MA wrote:

" [W]ho makes up the categories? Professional biologists [categories] are
" pretty good at predicting [biological] details, but much less useful at
" predicting more mundane attributes like edibility...  [H]ow useful
" modern English biological categories are to the average language user
" (as opposed to the professional biologist) may be questionable.

A persistent misunderstanding (or perhaps a divergence of interest) 
seems to be running through some aspects of this discussion. In my view,
cognitive theory is not -- and should not ITSELF aspire to be -- amateur
taxonomy or amateur ontology. Cognitive theorists should be trying to
model how categories are represented in the head by testing models of
how devices manage to categorize as people do. The only face-valid
constraint on this enterprise is the data on human (and animal)
categorization performance capacity: What people can actually sort and
label, and what labels and sortings they produce.

Ordinary language users are people; biologists are people; ontologists
are people; sometimes they happen to be the same people, sometimes not.
Sometimes people's categorization performance is reliable and
all-or-none, sometimes not. Sometimes the reliability is or can be
raised to virtually 100% correct all-or-none performance (this is the
core of our categorization capacity) sometimes not. Sometimes (and this
is important) there is (temporarily or permanently) NO BASIS on which
either people OR cognitive theorists can assess whether or not a
categorization is correct, because no detectable consequences follow
from MIScategorization. This may happen (and often does) in certain
anomalous or fuzzy regions of the sample space; but if it happens for
all or most of a "category," then it is simple not a category (or not
yet a category).

So it doesn't really matter who makes up the categories. It just matters
that human performance indicates that they are there, and can be
sorted and labeled on the basis of SOMETHING. If the sorting is all-or-none
and reliable (as it is for a vast core of ordinary cognition) then, I
claim, it must have a classical (invariant featural) basis in the input
instances themselves, or, recursively, in whatever the input instances
are GROUNDED in.

And it also matters that the categories (or, more appropriately,
MIScategorization) must have consequences. This is what guides and
constrains both the categorizer and the categorization theorist. The
categories of ordinary folk are typically calibrated by one species of
consequences (usually related to sustenance and certain [partially
self-imposed] social constraints), whereas the categories of scientists
are calibrated by "empiricism" -- which is to say: the consequences of
experimental tests and the internal coherence and implications of
scientists' explanatory theories.

Sometimes folk and scientific categories square with one another,
sometimes they do not. It is not the cognitive theorist's burden to
equate them, just to model them as both being empirical instances of
human categorization performance capacity. Nor does the "English
Language" integrate them; lay and scientific categories usually simply
get different dictionary entries. In a sense, though, the scientist is
closer to having an integrated category, since he presumably has
internal representations of both, with the lay category encoded as a
special case or weaker approximation of the scientific one. (The
factors of approximation, cumulativity and convergence in
categorization are discussed in my book.) And as POTENTIAL categories
that could be formed by all human beings within one head, it is of
course the burden of the cognitive theorist to model the cumulative
represetation too.

The intuitions and introspections of ordinary folk about HOW they
accomplish their categorizations is likely to be of limited usefulness
to the cognitive theorist. The introspections of scientists may be
somewhat more useful, because they tend to be more explicit about the
features they are using, but even here they have no FACE-validity: It's
what Simon DOES that matters, not what Simon SAYS he does. But, in the
end, no "expert" will be able to do the cognitive theorist's work for
him, which is to model the internal representations that will
successfully generate human performance capacity.

" [E]volutionary taxonomists define categories phylogenetically,
" not in terms of observable physiological features...
" Identification procedures that are based on observable features can usually
" be constructed, although these may be based on "family resemblance" ideas.
" Ernst Mayr wrote:
" "A taxon is in fact a group of [evolutionary] relatives, and whether
" "or not they have the same "characters in common" is irrelevant. Many
" "taxa are based on a combination of characters, and frequently not a
" "single one of these characters is present in all members of the
" "taxon...  Each species possesses a large (but unspecified) number of
" "the total number of properties of the taxon"

The key here is that "identification procedures based on observable
features can usually be constructed." That seems to give away the store.
No symbol grounding theory (including my own) -- at least no
non-positivistic one -- would require either laymen or scientists to
speak exclusively in an observation language. But their terms must
somehow be GROUNDED in observations, otherwise how is one to say
whether or not the categorization is "correct"? (In fact, how is one
otherwise even to know what the words mean? Unless grounded somehow
is something other than words, they are just meaningless strings
of symbols. That's the symbol grounding problem. And to say that
the "solution" is simply to connect the symbols to objects "in the
right way" is simply to beg the question. For the categorization
problem IS the problem of how symbols come to be connected to objects!)

"Family resemblances" is simply a red herring. Most of this
pseudoproblem is handled by noting that disjunctive features are
perfectly valid features (which is what launched this whole
discussion). So is a complex "polythetic" rule that says "It's an X if
it has at least K out of M properties." Moreover, "common descent"
(though not always available for observation, obviously) seems a
perfectly classical "feature" even on the arbitrary view that only
shared monadic properties qualify as features.

So taxa too, to the extent that they are reliable, decidable,
all-or-none categories at all, must be decided by their consequences:
The consequences are not based on whether or not the biologist
eats, but on whether or not the taxonomic system is internally coherent and
has testable consequences. Internal consistency alone, by the way, is
certainly not good enough, as the long history of arbitrary typologies
mankind has come up with testifies (e.g., astrology, yin/yang, and the
many self-fulfilling, ad hoc, AD LIB classifications that psychologists
have proposed to us across time in place of a substantive predictive
theory); see the prior discussion on imposed vs. ad lib categorization.

" I question whether an average language user is in fact capable of
" always reliably identifying a "bird", a "mammal", or a "fish"...
" I suspect we would fail on unusual cases that are not taught in school
" (e.g. pterodactyls and ichthyosaurs).

It must be repeated that where there is no reliable categorization
performance -- or worse, no objective BASIS for reliable categorization
performance -- there simply IS NO CATEGORY (or not yet a category).
For the cognitive theorist, a category consists of the cases you CAN
sort and label, not those you can't. To ask for more, as I said, is for
cognitive theory to over-reach into the domain of empirical taxonomy or
ontology.

" Hilary Putnam... suggested that definitions can make reference to
" expert knowledge (e.g. "I don't know whether an ichthyosaur is a fish
" or a reptile, but I know who to ask to find out"). This sounds like as
" good a suggestion as any for how the average language user defines
" biological categories.

Putnam is not a cognitive theorist who is concerned with how to model
the internal mechanism that allows us to sort and label inputs. He is
a philosopher concerned with the philosopher's problem of how a name
"fixes" a referent, in the sense that "the elementary particle
physicists will say is basic in the year 2000" seems to "pick out"
something "out there" that I already have "in mind" right now when I
refer to "it." And, in a sense, the physicist's future say-so is a kind
of "feature." But it's more like the "Dumpty-says" feature discussed in
another posting in this discussion. And it's not much use without the
expert oracle. To the cognitive theorist this only indicates that some
categories cannot be sorted without someone else's help. That's not a
very interesting representation.

On the other hand, this example does bring out some interesting
aspects of the grounding problem: The higher levels of discourse
in a grounded symbol system can be quite abstract and removed
from observation, yet they may still be coherent and even informative.
As long as "fish," "reptile,"  and, say, "vertebrate," are grounded, I
can go on to talk and learn a lot about "Ichtyosaurus" knowing only
that it's a vertebrate that's either a fish or a reptile, despite
neither having ever seen one nor being able, with my current resources,
to be able to pick one out if I ever did see one. This is a powerful
and remarkable feature of grounding. But it is a flagrant flouting of
what I've dubbed the "entry-point problem" for category modeling merely
to step into the category network at an arbitrary point like this,
simply supposing oneself to be the HEIR to all the prior requisite
categories (such as "fish" and "reptile") without having worked for
them or or at least specified how THEY got there, and then trying to
say something general about category representations, such as that
"they need not be based on classical features"!

So I may defer to expert knowledge in order to talk at all about
some of my vaguer categories, but that's hardly the paradigm for
my categorization performance and its substrates. According to
my grounding theory, I must have done a lot of hard work by direct
acquaintance with sensory categories before I built up the grounded
system that now allows me to rely on experts' say-so. The theorist has
to do a lot of hard work too, before he can help himself to this
derivative high-level capability.
-- 
Stevan Harnad INTERNET:  harnad@confidence.princeton.edu    harnad@princeton.edu
srh@flash.bellcore.com    harnad@elbereth.rutgers.edu      harnad@princeton.uucp
BITNET:   harnad@pucc.bitnet           CSNET:  harnad%princeton.edu@relay.cs.net
(609)-921-7771