Path: utzoo!attcan!uunet!lll-winken!ncis.llnl.gov!helios.ee.lbl.gov!nosc!ucsd!rutgers!elbereth.rutgers.edu!harnad From: harnad@elbereth.rutgers.edu (Stevan Harnad) Newsgroups: comp.ai Subject: Re: Biological Categorization Summary: On doing cognitive theory vs. doing ontology Message-ID: Date: 21 Jan 89 18:42:32 GMT References: <681@cogsci.ucsd.EDU> <2959@uhccux.uhcc.hawaii.edu> <1007@husc6.harvard.edu> Organization: Rutgers Univ., New Brunswick, N.J. Lines: 188 reiter@endor.harvard.edu (Ehud Reiter) of Aiken Computation Lab Harvard, Cambridge, MA wrote: " [W]ho makes up the categories? Professional biologists [categories] are " pretty good at predicting [biological] details, but much less useful at " predicting more mundane attributes like edibility... [H]ow useful " modern English biological categories are to the average language user " (as opposed to the professional biologist) may be questionable. A persistent misunderstanding (or perhaps a divergence of interest) seems to be running through some aspects of this discussion. In my view, cognitive theory is not -- and should not ITSELF aspire to be -- amateur taxonomy or amateur ontology. Cognitive theorists should be trying to model how categories are represented in the head by testing models of how devices manage to categorize as people do. The only face-valid constraint on this enterprise is the data on human (and animal) categorization performance capacity: What people can actually sort and label, and what labels and sortings they produce. Ordinary language users are people; biologists are people; ontologists are people; sometimes they happen to be the same people, sometimes not. Sometimes people's categorization performance is reliable and all-or-none, sometimes not. Sometimes the reliability is or can be raised to virtually 100% correct all-or-none performance (this is the core of our categorization capacity) sometimes not. Sometimes (and this is important) there is (temporarily or permanently) NO BASIS on which either people OR cognitive theorists can assess whether or not a categorization is correct, because no detectable consequences follow from MIScategorization. This may happen (and often does) in certain anomalous or fuzzy regions of the sample space; but if it happens for all or most of a "category," then it is simple not a category (or not yet a category). So it doesn't really matter who makes up the categories. It just matters that human performance indicates that they are there, and can be sorted and labeled on the basis of SOMETHING. If the sorting is all-or-none and reliable (as it is for a vast core of ordinary cognition) then, I claim, it must have a classical (invariant featural) basis in the input instances themselves, or, recursively, in whatever the input instances are GROUNDED in. And it also matters that the categories (or, more appropriately, MIScategorization) must have consequences. This is what guides and constrains both the categorizer and the categorization theorist. The categories of ordinary folk are typically calibrated by one species of consequences (usually related to sustenance and certain [partially self-imposed] social constraints), whereas the categories of scientists are calibrated by "empiricism" -- which is to say: the consequences of experimental tests and the internal coherence and implications of scientists' explanatory theories. Sometimes folk and scientific categories square with one another, sometimes they do not. It is not the cognitive theorist's burden to equate them, just to model them as both being empirical instances of human categorization performance capacity. Nor does the "English Language" integrate them; lay and scientific categories usually simply get different dictionary entries. In a sense, though, the scientist is closer to having an integrated category, since he presumably has internal representations of both, with the lay category encoded as a special case or weaker approximation of the scientific one. (The factors of approximation, cumulativity and convergence in categorization are discussed in my book.) And as POTENTIAL categories that could be formed by all human beings within one head, it is of course the burden of the cognitive theorist to model the cumulative represetation too. The intuitions and introspections of ordinary folk about HOW they accomplish their categorizations is likely to be of limited usefulness to the cognitive theorist. The introspections of scientists may be somewhat more useful, because they tend to be more explicit about the features they are using, but even here they have no FACE-validity: It's what Simon DOES that matters, not what Simon SAYS he does. But, in the end, no "expert" will be able to do the cognitive theorist's work for him, which is to model the internal representations that will successfully generate human performance capacity. " [E]volutionary taxonomists define categories phylogenetically, " not in terms of observable physiological features... " Identification procedures that are based on observable features can usually " be constructed, although these may be based on "family resemblance" ideas. " Ernst Mayr wrote: " "A taxon is in fact a group of [evolutionary] relatives, and whether " "or not they have the same "characters in common" is irrelevant. Many " "taxa are based on a combination of characters, and frequently not a " "single one of these characters is present in all members of the " "taxon... Each species possesses a large (but unspecified) number of " "the total number of properties of the taxon" The key here is that "identification procedures based on observable features can usually be constructed." That seems to give away the store. No symbol grounding theory (including my own) -- at least no non-positivistic one -- would require either laymen or scientists to speak exclusively in an observation language. But their terms must somehow be GROUNDED in observations, otherwise how is one to say whether or not the categorization is "correct"? (In fact, how is one otherwise even to know what the words mean? Unless grounded somehow is something other than words, they are just meaningless strings of symbols. That's the symbol grounding problem. And to say that the "solution" is simply to connect the symbols to objects "in the right way" is simply to beg the question. For the categorization problem IS the problem of how symbols come to be connected to objects!) "Family resemblances" is simply a red herring. Most of this pseudoproblem is handled by noting that disjunctive features are perfectly valid features (which is what launched this whole discussion). So is a complex "polythetic" rule that says "It's an X if it has at least K out of M properties." Moreover, "common descent" (though not always available for observation, obviously) seems a perfectly classical "feature" even on the arbitrary view that only shared monadic properties qualify as features. So taxa too, to the extent that they are reliable, decidable, all-or-none categories at all, must be decided by their consequences: The consequences are not based on whether or not the biologist eats, but on whether or not the taxonomic system is internally coherent and has testable consequences. Internal consistency alone, by the way, is certainly not good enough, as the long history of arbitrary typologies mankind has come up with testifies (e.g., astrology, yin/yang, and the many self-fulfilling, ad hoc, AD LIB classifications that psychologists have proposed to us across time in place of a substantive predictive theory); see the prior discussion on imposed vs. ad lib categorization. " I question whether an average language user is in fact capable of " always reliably identifying a "bird", a "mammal", or a "fish"... " I suspect we would fail on unusual cases that are not taught in school " (e.g. pterodactyls and ichthyosaurs). It must be repeated that where there is no reliable categorization performance -- or worse, no objective BASIS for reliable categorization performance -- there simply IS NO CATEGORY (or not yet a category). For the cognitive theorist, a category consists of the cases you CAN sort and label, not those you can't. To ask for more, as I said, is for cognitive theory to over-reach into the domain of empirical taxonomy or ontology. " Hilary Putnam... suggested that definitions can make reference to " expert knowledge (e.g. "I don't know whether an ichthyosaur is a fish " or a reptile, but I know who to ask to find out"). This sounds like as " good a suggestion as any for how the average language user defines " biological categories. Putnam is not a cognitive theorist who is concerned with how to model the internal mechanism that allows us to sort and label inputs. He is a philosopher concerned with the philosopher's problem of how a name "fixes" a referent, in the sense that "the elementary particle physicists will say is basic in the year 2000" seems to "pick out" something "out there" that I already have "in mind" right now when I refer to "it." And, in a sense, the physicist's future say-so is a kind of "feature." But it's more like the "Dumpty-says" feature discussed in another posting in this discussion. And it's not much use without the expert oracle. To the cognitive theorist this only indicates that some categories cannot be sorted without someone else's help. That's not a very interesting representation. On the other hand, this example does bring out some interesting aspects of the grounding problem: The higher levels of discourse in a grounded symbol system can be quite abstract and removed from observation, yet they may still be coherent and even informative. As long as "fish," "reptile," and, say, "vertebrate," are grounded, I can go on to talk and learn a lot about "Ichtyosaurus" knowing only that it's a vertebrate that's either a fish or a reptile, despite neither having ever seen one nor being able, with my current resources, to be able to pick one out if I ever did see one. This is a powerful and remarkable feature of grounding. But it is a flagrant flouting of what I've dubbed the "entry-point problem" for category modeling merely to step into the category network at an arbitrary point like this, simply supposing oneself to be the HEIR to all the prior requisite categories (such as "fish" and "reptile") without having worked for them or or at least specified how THEY got there, and then trying to say something general about category representations, such as that "they need not be based on classical features"! So I may defer to expert knowledge in order to talk at all about some of my vaguer categories, but that's hardly the paradigm for my categorization performance and its substrates. According to my grounding theory, I must have done a lot of hard work by direct acquaintance with sensory categories before I built up the grounded system that now allows me to rely on experts' say-so. The theorist has to do a lot of hard work too, before he can help himself to this derivative high-level capability. -- Stevan Harnad INTERNET: harnad@confidence.princeton.edu harnad@princeton.edu srh@flash.bellcore.com harnad@elbereth.rutgers.edu harnad@princeton.uucp BITNET: harnad@pucc.bitnet CSNET: harnad%princeton.edu@relay.cs.net (609)-921-7771