Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!princeton!mind!harnad
From: harnad@mind.UUCP (Stevan Harnad)
Newsgroups: comp.ai,comp.cog-eng
Subject: Re: The symbol grounding problem (Part 2 of 2)
Message-ID: <770@mind.UUCP>
Date: Fri, 22-May-87 14:08:53 EDT
Article-I.D.: mind.770
Posted: Fri May 22 14:08:53 1987
Date-Received: Sat, 23-May-87 16:14:08 EDT
References: <764@mind.UUCP> <768@mind.UUCP>
Organization: Cognitive Science, Princeton University
Lines: 219
Keywords: icons, categories, symbols, grounding
Summary: Symbols cannot be grounded in texts
Xref: mnetor comp.ai:441 comp.cog-eng:102


Rik Belew <rik%roland@SDCSVAX.UCSD.EDU> writes:

>	I use ``icon'' to mean much the same as your ``categorical
>	representations''... their direct, albeit statistical,
>	relationship with sensory features... distinguishes icons from
>	``symbols'', which are representations without structural
>	correspondence with the environment.

The criterion for being iconic is physical isomorphism
( = "structural correspondence"). This means that the relationship
between an object and its icon must be a physically invertible
(analog) transformation. In my model, iconic representations
are isomorphic with the unfiltered sensory projection of the
input they represent, whereas categorical representations
are only isomorphic with selected features of the input.
In that sense they are "micro-iconic." The important point is
that they are selective and based on abstracting some features and
discarding all the rest. The basis of selection is: "What features do
I need in order to categorize this input correctly, relative to other
confusable alternatives I have encountered and may encounter in the
future?" To call the input an "X" on the basis of such a selective,
context-governed feature filter, however, is hardly to say that one
has an "icon" of an "X" in the same sense that iconic representations
are icons of input sensory projections. The "structural
correspondence" is only with the selected features, not with the "object"
being named.

On the other hand, the complete absence of any structural
correspondence whatever is indeed what distinguishes both iconic and
categorical representations from symbolic ones. The heart of my symbol
grounding proposal is that in allowing you to speak of (identify,
label, categorize) "X's" at all, categorical representations have
provided you with a set of elementary labels, based on nonsymbolic
representations, that can now ground an otherwise purely syntactic
symbol system in the objects and events to which it refers. Note,
though, that the grounding is a strong constraint, one that renders
the symbolic system no longer the autonomous syntactic module of
conventional AI. The system is hybrid through-and-through. The
relations between the three kinds of representation are not modular but
bottom-up, with the nonsymbolic representations supporting the
symbolic representations' relation to objects. Most of the rules for
symbol binding, etc. are now constrained in ways that depart from the
freedom of ungrounded formal systems.

>	Your, more restricted, notion of ``symbol'' seems to differ in two
>	major respects: its emphasis on the systematicity of symbols; and its
>	use of LABELS (of categories) as the atomic elements.  I accept
>	the systematicity requirement, but I believe your labeling notion
>	confounds several important factors...
>	First, I believe you are using labels to mean POINTERS:
>	computationally efficient references to more elaborate and complete
>	representations... valuable not only for pointing from symbols
>	to icons (the role you intend for labels) but also from one place in
>	the symbolic representation to another...
>	many connectionists have taken this pointer quality to be
>	what they mean by "symbol."

I believe my grounding proposal is a lot more specific than merely a
pointing proposal. Pointing is, after all, a symbol-to-symbol
function. It may get you to an address, but it won't get you from a
word to the nonsymbolic object to which it refers. The labeling
performance that categorical representations subserve, on the other
hand, is an operation on objects in the world. That is why I proposed
grounding elementary symbols in it: Let the arbitrary labels of
reliably sorted object categories be the elementary symbols of the
symbolic system. Such a hybrid system would continue to have most of
the benefits of higher-order systematicity (compositionality), but with
nonsymbolic constraints "weighing down" its elementary terms. Consider
ordinary syntactic constraints to be "top-down" constraints on a
symbol-system. A grounded hybrid system would have "bottom-up"
constraints on its symbol combinations too.

As to the symbolic status of connectionism -- that still seems to be moot.

>	The other feature of your labeling notion that intrigues me is
>	the naming activity it implies.  This is where I see the issues
>	of language as becoming critical. ...truly symbolic representations and
>	language are co-dependent. I believe we agree on this point...
>	true symbol manipulation arose only as a response to language
>	Current connectionist research is showing just how
>	powerful iconic (and perhaps categorical) representations can
>	be... I use the term language broadly, to
>	include the behavior of other animals for example.

Labeling and categorizing is much more primitive than language, and
that's all I require to ground a symbol system. All this calls for is
reliable discrimination and identification of objects. Animals
certainly do it. Machines should be able to do it (although until they
approach the performance capacity of the "Total Turing Test" they may be
doing it modularly in a nonrepresentative way). Language seems to be
more than labeling and categorizing. It also requires *describing*,
and that requires symbol-combining functions that in my model depend
critically on prior labeling and categorizing.

Again, the symbolic/nonsymbolic status of connectionism still seems to
be under analysis. In my model the provisional role of connectionistic
processes is in inducing and encoding the invariant features in the
categorical representation.

>	the aspect of symbols [that] connectionism
>	needs most is something resembling pointers. More elaborate notions of
>	symbol introduce difficult semantic issues of language that can be
>	separated and addressed independently... Without pointers,
>	connectionist systems will be restricted to ``iconic'' representations
>	whose close correspondence with the literal world severely limits them
>	from ``subserving'' most higher (non-lingual) cognitive functioning.

I don't think pointer function can be divorced from semantic issues in
a symbol system. Symbols don't just combine and recombine according to
syntactic rules, they are also semantically interpretable. Pointing is a
symbol-to-symbol relation. Semantics is a symbol-to-object
relationship. But without a semantically interpretable system you
don't have a symbol system at all, so what would be pointing to what?

For what it's worth, I don't personally believe that there is any
point in connectionism's trying to emulate bits and pieces of the
virtues of symbol systems, such as pointing. Symbolic AI's
problem was that it had symbol strings that were interpretable as
"standing for" objects and events, but that relation seemed to be in
the head of the (human) interpreter, i.e., it was derivative, ungrounded.
Except where this could be resolved by brute-force hard-wiring into a
dedicated system married to its peripheral devices, this grounding
problem remained unsolved for pure symbolic AI. Why should
connectionism aspire to inherit it? Sure, having objects around that
you can interpret as standing for things in the world and yet still
manipulate formally is a strength. But at some point the
interpretation must be cashed in (at least in mind-modeling) and then
the strength becomes a weakness. Perhaps a role in the hybrid mediation
between the symbolic and the nonsymbolic is more appropriate for
connectionism than direct competition or emulation.

>	While I agree with the aims of your Total Turing Test (TTT),
>	viz. capturing the rich interrelated complexity characteristic
>	of human cognition, I have never found this direct comparison
>	to human performance helpful.  A criterion of cognitive
>	adequacy that relies so heavily on comparison with humans
>	raises many tangential issues.  I can imagine many questions
>	(e.g., regarding sex, drugs, rock and roll) that would easily
>	discriminate between human and machine. Yet I do not see such
>	questions illuminating issues in cognition. 

My TTT criterion has been much debated on the Net. The short reply is
that the goal of the TTT is not to capture complexity but to capture
performance capacity, and the only way to maximize your confidence
that you're capturing it the right way (i.e., the way the mind does it)
is to capture all of it. This does not mean sex, drugs and rock and
roll (there are people who do none of these). It means (1) formally,
that a candidate model must generate all of our generic performance
capacities (of discriminating, identifying, manipulating and describing
objects and events, and producing and responding appropriately to names
and descriptions), and (2) (informally) the way it does so must be
intuitively indistinguishable from the way a real person does, as
judged by a real person. The goal is asymptotic, but it's
the only one so far proposed that cuts the underdetermination of
cognitive theory down to the size of the ordinary underdetermination of
scientific theory by empirical observations: It's the next best thing
to being there (in the mind of the robot).

>	First, let's do our best to imagine providing an artificial cognitive
>	system (a robot) with the sort of grounding experience you and I both
>	believe necessary to full cognition.  Let's give it video eyes,
>	microphone ears, feedback from its affectors, etc.  And let's even
>	give it something approaching the same amount of time in this
>	environment that the developing child requires...
>	the corpus of experience acquired by such a robot is orders of magnitude
>	more complex than any system today... [yet] even such a complete
>	system as this would have a radically different experience of the
>	world than our own. The communication barrier between the symbols
>	of man and the symbols of machine to which I referred in my last
>	message is a consequence of this [difference].

My own conjecture is that simple peripheral modules like these will *not* be
enough to ground an artificial cognitive system, at least not
enough to make any significant progress toward the TTT. The kind of
grounding I'm proposing calls for nonsymbolic internal representations
of the kind I described (iconic representations [IRs] and categorical
representations [CRs]), related to one another and to input and output in
the way I described. The critical thing is not the grounding
*experience*, but what the system can *do* with it in order to
discriminate and identify as we do. I have hypothesized that it must have
IRs and CRs in order to do so. The problem is not complexity (at least
not directly), but performance capacity, and what it takes to generate
it. And the only relevant difference between contemporary machine
models and people is not their *experience* per se, but their
performance capacities. No model comes close. They're all
special-purpose toys. And the ultimate test of man/machine
"communication" is of course the TTT!

>	So the question for me becomes: how might we give a machine the
>	same rich corpus of experience (hence satisfying the total part
>	of your TTT) without relying on such direct experiential
>	contact with the world?  The answer for me (at the moment) is
>	to begin at the level of WORDS... the enormous textual
>	databases of information retrieval (IR) systems...
>	I want to take this huge set of ``labels,'' attached by humans to
>	their world, as my primitive experiential database... 
>	The task facing my system, then, is to look at and learn from this
>	world:... the textbase itself [and] interactions with IR users...
>	the system then adapts its (connectionist) representation...

Your hypothesis is that an information retrieval system whose only
source of input is text (symbols) plus feedback from human users (more
symbols) will capture a significant component of cognition. Your
hypothesis may be right. My own conjecture, however, is the exact
opposite. I don't believe that input consisting of nothing but symbols
constitutes "experience." I think it constitutes (ungrounded) symbols,
inheriting, as usual, the interpretations of the users with which the
system interacts. I don't think that doing connectionism instead of
symbol-crunching with this kind of input makes it any more likely to
overcome the groundedness problem, but again, I may be wrong. But
performance capacity (not experience) -- i.e., the TTT -- will have
to be the ultimate arbiter of these hypotheses.
-- 

Stevan Harnad                                  (609) - 921 7771
{bellcore, psuvax1, seismo, rutgers, packard}  !princeton!mind!harnad
harnad%mind@princeton.csnet       harnad@mind.Princeton.EDU