Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!princeton!mind!harnad From: harnad@mind.UUCP (Stevan Harnad) Newsgroups: comp.ai,comp.cog-eng Subject: Re: The symbol grounding problem (Part 2 of 2) Message-ID: <770@mind.UUCP> Date: Fri, 22-May-87 14:08:53 EDT Article-I.D.: mind.770 Posted: Fri May 22 14:08:53 1987 Date-Received: Sat, 23-May-87 16:14:08 EDT References: <764@mind.UUCP> <768@mind.UUCP> Organization: Cognitive Science, Princeton University Lines: 219 Keywords: icons, categories, symbols, grounding Summary: Symbols cannot be grounded in texts Xref: mnetor comp.ai:441 comp.cog-eng:102 Rik Belew writes: > I use ``icon'' to mean much the same as your ``categorical > representations''... their direct, albeit statistical, > relationship with sensory features... distinguishes icons from > ``symbols'', which are representations without structural > correspondence with the environment. The criterion for being iconic is physical isomorphism ( = "structural correspondence"). This means that the relationship between an object and its icon must be a physically invertible (analog) transformation. In my model, iconic representations are isomorphic with the unfiltered sensory projection of the input they represent, whereas categorical representations are only isomorphic with selected features of the input. In that sense they are "micro-iconic." The important point is that they are selective and based on abstracting some features and discarding all the rest. The basis of selection is: "What features do I need in order to categorize this input correctly, relative to other confusable alternatives I have encountered and may encounter in the future?" To call the input an "X" on the basis of such a selective, context-governed feature filter, however, is hardly to say that one has an "icon" of an "X" in the same sense that iconic representations are icons of input sensory projections. The "structural correspondence" is only with the selected features, not with the "object" being named. On the other hand, the complete absence of any structural correspondence whatever is indeed what distinguishes both iconic and categorical representations from symbolic ones. The heart of my symbol grounding proposal is that in allowing you to speak of (identify, label, categorize) "X's" at all, categorical representations have provided you with a set of elementary labels, based on nonsymbolic representations, that can now ground an otherwise purely syntactic symbol system in the objects and events to which it refers. Note, though, that the grounding is a strong constraint, one that renders the symbolic system no longer the autonomous syntactic module of conventional AI. The system is hybrid through-and-through. The relations between the three kinds of representation are not modular but bottom-up, with the nonsymbolic representations supporting the symbolic representations' relation to objects. Most of the rules for symbol binding, etc. are now constrained in ways that depart from the freedom of ungrounded formal systems. > Your, more restricted, notion of ``symbol'' seems to differ in two > major respects: its emphasis on the systematicity of symbols; and its > use of LABELS (of categories) as the atomic elements. I accept > the systematicity requirement, but I believe your labeling notion > confounds several important factors... > First, I believe you are using labels to mean POINTERS: > computationally efficient references to more elaborate and complete > representations... valuable not only for pointing from symbols > to icons (the role you intend for labels) but also from one place in > the symbolic representation to another... > many connectionists have taken this pointer quality to be > what they mean by "symbol." I believe my grounding proposal is a lot more specific than merely a pointing proposal. Pointing is, after all, a symbol-to-symbol function. It may get you to an address, but it won't get you from a word to the nonsymbolic object to which it refers. The labeling performance that categorical representations subserve, on the other hand, is an operation on objects in the world. That is why I proposed grounding elementary symbols in it: Let the arbitrary labels of reliably sorted object categories be the elementary symbols of the symbolic system. Such a hybrid system would continue to have most of the benefits of higher-order systematicity (compositionality), but with nonsymbolic constraints "weighing down" its elementary terms. Consider ordinary syntactic constraints to be "top-down" constraints on a symbol-system. A grounded hybrid system would have "bottom-up" constraints on its symbol combinations too. As to the symbolic status of connectionism -- that still seems to be moot. > The other feature of your labeling notion that intrigues me is > the naming activity it implies. This is where I see the issues > of language as becoming critical. ...truly symbolic representations and > language are co-dependent. I believe we agree on this point... > true symbol manipulation arose only as a response to language > Current connectionist research is showing just how > powerful iconic (and perhaps categorical) representations can > be... I use the term language broadly, to > include the behavior of other animals for example. Labeling and categorizing is much more primitive than language, and that's all I require to ground a symbol system. All this calls for is reliable discrimination and identification of objects. Animals certainly do it. Machines should be able to do it (although until they approach the performance capacity of the "Total Turing Test" they may be doing it modularly in a nonrepresentative way). Language seems to be more than labeling and categorizing. It also requires *describing*, and that requires symbol-combining functions that in my model depend critically on prior labeling and categorizing. Again, the symbolic/nonsymbolic status of connectionism still seems to be under analysis. In my model the provisional role of connectionistic processes is in inducing and encoding the invariant features in the categorical representation. > the aspect of symbols [that] connectionism > needs most is something resembling pointers. More elaborate notions of > symbol introduce difficult semantic issues of language that can be > separated and addressed independently... Without pointers, > connectionist systems will be restricted to ``iconic'' representations > whose close correspondence with the literal world severely limits them > from ``subserving'' most higher (non-lingual) cognitive functioning. I don't think pointer function can be divorced from semantic issues in a symbol system. Symbols don't just combine and recombine according to syntactic rules, they are also semantically interpretable. Pointing is a symbol-to-symbol relation. Semantics is a symbol-to-object relationship. But without a semantically interpretable system you don't have a symbol system at all, so what would be pointing to what? For what it's worth, I don't personally believe that there is any point in connectionism's trying to emulate bits and pieces of the virtues of symbol systems, such as pointing. Symbolic AI's problem was that it had symbol strings that were interpretable as "standing for" objects and events, but that relation seemed to be in the head of the (human) interpreter, i.e., it was derivative, ungrounded. Except where this could be resolved by brute-force hard-wiring into a dedicated system married to its peripheral devices, this grounding problem remained unsolved for pure symbolic AI. Why should connectionism aspire to inherit it? Sure, having objects around that you can interpret as standing for things in the world and yet still manipulate formally is a strength. But at some point the interpretation must be cashed in (at least in mind-modeling) and then the strength becomes a weakness. Perhaps a role in the hybrid mediation between the symbolic and the nonsymbolic is more appropriate for connectionism than direct competition or emulation. > While I agree with the aims of your Total Turing Test (TTT), > viz. capturing the rich interrelated complexity characteristic > of human cognition, I have never found this direct comparison > to human performance helpful. A criterion of cognitive > adequacy that relies so heavily on comparison with humans > raises many tangential issues. I can imagine many questions > (e.g., regarding sex, drugs, rock and roll) that would easily > discriminate between human and machine. Yet I do not see such > questions illuminating issues in cognition. My TTT criterion has been much debated on the Net. The short reply is that the goal of the TTT is not to capture complexity but to capture performance capacity, and the only way to maximize your confidence that you're capturing it the right way (i.e., the way the mind does it) is to capture all of it. This does not mean sex, drugs and rock and roll (there are people who do none of these). It means (1) formally, that a candidate model must generate all of our generic performance capacities (of discriminating, identifying, manipulating and describing objects and events, and producing and responding appropriately to names and descriptions), and (2) (informally) the way it does so must be intuitively indistinguishable from the way a real person does, as judged by a real person. The goal is asymptotic, but it's the only one so far proposed that cuts the underdetermination of cognitive theory down to the size of the ordinary underdetermination of scientific theory by empirical observations: It's the next best thing to being there (in the mind of the robot). > First, let's do our best to imagine providing an artificial cognitive > system (a robot) with the sort of grounding experience you and I both > believe necessary to full cognition. Let's give it video eyes, > microphone ears, feedback from its affectors, etc. And let's even > give it something approaching the same amount of time in this > environment that the developing child requires... > the corpus of experience acquired by such a robot is orders of magnitude > more complex than any system today... [yet] even such a complete > system as this would have a radically different experience of the > world than our own. The communication barrier between the symbols > of man and the symbols of machine to which I referred in my last > message is a consequence of this [difference]. My own conjecture is that simple peripheral modules like these will *not* be enough to ground an artificial cognitive system, at least not enough to make any significant progress toward the TTT. The kind of grounding I'm proposing calls for nonsymbolic internal representations of the kind I described (iconic representations [IRs] and categorical representations [CRs]), related to one another and to input and output in the way I described. The critical thing is not the grounding *experience*, but what the system can *do* with it in order to discriminate and identify as we do. I have hypothesized that it must have IRs and CRs in order to do so. The problem is not complexity (at least not directly), but performance capacity, and what it takes to generate it. And the only relevant difference between contemporary machine models and people is not their *experience* per se, but their performance capacities. No model comes close. They're all special-purpose toys. And the ultimate test of man/machine "communication" is of course the TTT! > So the question for me becomes: how might we give a machine the > same rich corpus of experience (hence satisfying the total part > of your TTT) without relying on such direct experiential > contact with the world? The answer for me (at the moment) is > to begin at the level of WORDS... the enormous textual > databases of information retrieval (IR) systems... > I want to take this huge set of ``labels,'' attached by humans to > their world, as my primitive experiential database... > The task facing my system, then, is to look at and learn from this > world:... the textbase itself [and] interactions with IR users... > the system then adapts its (connectionist) representation... Your hypothesis is that an information retrieval system whose only source of input is text (symbols) plus feedback from human users (more symbols) will capture a significant component of cognition. Your hypothesis may be right. My own conjecture, however, is the exact opposite. I don't believe that input consisting of nothing but symbols constitutes "experience." I think it constitutes (ungrounded) symbols, inheriting, as usual, the interpretations of the users with which the system interacts. I don't think that doing connectionism instead of symbol-crunching with this kind of input makes it any more likely to overcome the groundedness problem, but again, I may be wrong. But performance capacity (not experience) -- i.e., the TTT -- will have to be the ultimate arbiter of these hypotheses. -- Stevan Harnad (609) - 921 7771 {bellcore, psuvax1, seismo, rutgers, packard} !princeton!mind!harnad harnad%mind@princeton.csnet harnad@mind.Princeton.EDU