5. Harnad, S. (2003) The Symbol Grounding Problem
What is the "symbol grounding problem," and how can it be solved? (The meaning of words must be grounded in sensorimotor categories.) (Words must be connected to what they refer to ("cats"). Sentences must have meaning: "The cat is on the mat.")
Harnad, S. (2003) The Symbol Grounding Problem. Encylopedia of Cognitive Science. Nature Publishing Group. Macmillan.
or:
https://en.wikipedia.org/wiki/Symbol_grounding
The Symbol Grounding Problem is related to the problem of how words get their meanings, and of what meanings are. The problem of meaning is in turn related to the problem of consciousness, or how it is that mental states are meaningful.
The Symbol Grounding Problem is related to the problem of how words get their meanings, and of what meanings are. The problem of meaning is in turn related to the problem of consciousness, or how it is that mental states are meaningful.
Other views:
Taddeo, M., & Floridi, L. (2005). Solving the symbol grounding problem: a critical review of fifteen years of research. Journal of Experimental & Theoretical Artificial Intelligence, 17(4), 419-445.
Barsalou, L. W. (2010). Grounded cognition: past, present, and future. Topics in Cognitive Science, 2(4), 716-724.
Bringsjord, S. (2014) The Symbol Grounding Problem... Remains Unsolved. Journal of Experimental & Theoretical Artificial Intelligence
In M. de Vega (Ed.), Symbols and Embodiment: Debates on Meaning and Cognition. Oxford University Press.Steels, L. (2008) The Symbol Grounding Problem Has Been Solved. So What's Next?
Barsalou, L. W. (2010). Grounded cognition: past, present, and future. Topics in Cognitive Science, 2(4), 716-724.
Bringsjord, S. (2014) The Symbol Grounding Problem... Remains Unsolved. Journal of Experimental & Theoretical Artificial Intelligence
As discussed extensively in class, the symbol grounding problem asks how symbols in formal systems acquire meaning. Previously we’ve made the crucial separation between pure computation as manipulating tokens based on their form (not content) and actual cognition. One step to solving this problem is creating categorical representations. These variants or features help in picking out real-world categories, and connectionist methods provide a way to extract those stable features from noisy raw sensory input. However, it is important to make the distinction that connectionist learning does not entirely solve the symbol grounding problem on its own. Rather it is a tool for discovering the categorical invariants a symbol can eventually be grounded in (a bridging tool). This partially answers my earlier questions about abstract concepts. Although they cannot be directly grounded in perception and action, it’s suggested they can be understood indirectly via symbolic composition. Combining already grounded symbols with higher-order structures.
ReplyDeleteHi Emily! you’re exactly right. Pure computation manipulates symbols syntactically, but without grounding, they stay empty of meaning. Connectionist learning can help by carving up raw sensory input into stable categories, an important first step toward grounding, but it isn’t enough on its own. To actually ground a symbol, the system has to link those learned categories to real sensorimotor capacities, detecting, acting on, and predicting things in the world. Once some basic symbols are grounded in direct experience, more abstract concepts can be built compositionally, as you pointed out, by combining grounded primitives into higher-order structures. This layered approach shows how we might move from raw data to genuinely meaningful representations, and it highlights why embodiment and interaction remain essential in any full account of cognition.
DeleteEmily, excellent synthesis. Here’s a bit of anticipatory detail, which will become clearer in the next weeks:
DeleteContent-words (nouns, verbs, adjectives, adverbs) have referents (things in the world that they refer to: cat, catch, catastrophe).
It is not words but propositions (content-words combined in true or false subject/predicate sentences: “apples are round”, “the cat is on the mat”) that have meanings.
Most content-words are category-names (look in a dictionary!). Category-learning is learning to detect the features that distinguish the members of the referent category from the non-members (so the learner can recognize members perceptually and know what to do and not do with them).
“Deep learning” [what is that?] by neural networks is a current model of the mechanism that learns to detect the distinguishing features of categories (through unsupervised and supervised learning [what are those?].
Categorization is “doing the right thing with the right kind of thing” (whether eating it or naming it).
There are two ways to “ground” content-words in their referent categories:
(1) direct sensorimotor grounding (i.e., unsupervised and supervised learning of the features that distinguish members from non-members, through exposure, trial and error, and corrective feedback) and
(2)indirect verbal grounding from the verbal description or definition of the features distinguishing members from non-members. This is grounding T2 in T3 (how?).
But the all-important precondition for grounding category names indirectly verbal through verbal learning is that all the content-words (feature-names) in the definition of the new content-word must already be grounded (whether by direct or indirect grounding), for the hearer (and, of course, the speaker). Think about that, and explain why, if you can.)
This will become clear as we get to category learning and language.
Ayla, another excellent synthesis: Can you answer any of the questions I asked Emily?
Thanks for pointing out that distinction between words and propositions because now I see more clearly that grounding a single content word is just the first step because that meaning begins when those grounded words get combined in propositions that can be true or false. In terms of the process of grounding, it seems that what is called "deep learning" provides a model of the way in which to learn the features that differentiate categories, including both unsupervised learning of detecting regularities and clustering features from raw input, and supervised learning of taking feedback or labels to inform the boundary of the category. This looks consistent with Harnad's (1990, 2003) framing of grounding, both with direct sensorimotor and indirect verbal grounding (as you are using it). The indirect -ai grounding Te grounding meaning T2 via T3 - works because if the words of the verbal definition are all already grounded, then Te takes its meaning from that verbal network; however, if those feature words are not grounded, that chain is broken. Barsalou (2010) speaks more on this layering where grounded primitives create abstract symbols, which are all based on the grounded sensorimotor layer of meaning.
DeleteLorena, you’re putting things together well, but you lost kid-sib at “…-ai grounding Te...” (though it’s probably my fault for using too many abbreviations!). The predicate (“round, red fruit”) of a proposition is predicated (claimed to be true) of the subject (“apple”). The proposition is true if an apple really is a round, red fruit.
DeleteThink of that as a short cut to learning the distinguishing features of apples — an indirect, verbal path, much faster and easier than direct sensorimotor learning by trial and error.
But it only works if the content-words of the predicate (red, round, fruit) are already grounded (whether directly or indirectly) for the hearer. Otherwise you don’t learn what “apple” refers to.
(Forget Barsalou’s “abstract symbols”: Symbols are just labels. And all categories are “abstract”, in the sense that you have to detect and abstract their distinguishing features — whether directly or indirectly — in order to know what to do and not-do with their members. And propositions are not long words: their predicates ground their subjects, as long as the predicates are grounded.)
Through my computer science classes, what I understand deep learning is, is any model that has one or more hidden layers in the model, that is there is some layer of "neurons" that are in between the input layer (say the layer intaking an image) and the output layer (say the layer that spits out the probabilty that an image is a dog is 90%).
DeleteNow the difference between supervised and unsupervised learning is that supervised learning has a "correct answer"--the label/output--that is associated with each input and the model has that information to try to learn to output that correct answer that is as close to that label as possible. Unsupervised learning then, does not have that "correct answer" all it has a jumble of data and its job is to pick out "meaningful" patterns that arise from that data.
e.g. from giving a file of pixels and the word dog (supervised), and giving a scatter plot and figuring out which clusters exist in that plot(unsupervised)
Hi Professor, thank you for your feedback!
DeleteDeep learning works through networks with many layers, where each layer picks out patterns in the input and passes them on. Early layers might detect simple features, while later layers combine those into more complex ones. In unsupervised learning, the system notices patterns on its own, clustering inputs that tend to appear together. In supervised learning, it gets feedback which helps it refine the features that actually matter for telling categories apart. Together, these processes allow the system to discover the stable features that define a category and separate members from non-members.
As for grounding T2 in T3, when a learner hears a verbal definition, the words in that definition only help if they are themselves already grounded in prior sensorimotor categories. For example, if a child has already grounded “round” and “red” through direct experience, then hearing “an apple is a round, red fruit” allows them to ground “apple” indirectly. But if none of the feature words are grounded, the definition collapses into more ungrounded symbols. This shows why every indirect verbal definition ultimately traces back to a core of directly grounded categories.
***EVERYBODY PLEASE NOTE: I REDUCED THE MINIMUM NUMBER OF SKYWRITINGS. BUT THE READINGS ARE **ALL** RELEVANT TO AN OVERALL UNDERSTANDING OF THE COURSE. SO, EVEN IF YOU DO NOT DO A SKYWRITING ON ALL OF THEM, AT LEAST FEED EACH READING YOU DO NOT READ TO CHATGPT AND ASK IT FOR A SUMMARY, SO YOU KNOW WHAT THE READING SAID — OTHERWISE YOU WILL NOT HAVE A COMPLETE GRASP OF THE COURSE TO INTEGRATE AND INTERCONNECT FOR THE FINAL EXAM.***
DeleteHi Lucy and Ayla, I would like to join you into defining supervised vs. unsupervised learning. I agree more with Ayla because the difference between these two is the interaction with the objects rather than having a “correct answer.” Unsupervised learning is learning categories by observing and analyzing the features of their members, but it is often not enough. Supervised learning is learning categories by not only observing the features but also by interacting with the objects through trial-and-error. The “feedback” mentioned by Ayla is the consequences of the actions you do with the objects. In the mushroom island, unsupervised learning is to look at the mushrooms and try to see which ones are poisonous and which ones are not depending on their colours and shapes, while supervised learning is to eat tiny bites of mushrooms to see which ones makes you sick and which ones do not. The “labels” mentioned by Lucy would simply be the symbols that refer to categories, but they are not necessary in categorization (as we can see in non-human species).
DeleteThe symbol grounding problem arises because the verbal/written Turing test (T2) can only test symbol manipulation and not true understanding, highlighted by the fact that LLMs are able to pass using the Big Gulp. The professor illustrates, by using Searle’s Chinese Room example, that only a machine that can interact with the world through perception and action, like something that could pass the robotic Turing test (T3), can truly understand. Consequently, a machine can only pass T2 if it can also pass T3.
ReplyDeleteJesse C, good summary, but what do you mean by "understanding"? We know that Searle does not have it, for Chinese, when he executes the Chinese T2-passing recipe. Would grounding the content-words of a language provide it?
DeleteI think grounding is what separates symbol manipulation from real “understanding.” In Searle’s room the rules let him produce the right outputs, but the words don’t connect to anything he’s experienced. A T3-level system, by tying its words to actual perception and action, at least uses symbols in a way that refers to the world. That may not prove it feels, but it does push “understanding” beyond empty pattern-matching.
DeleteInstructor , if I may, I’ll answer the questions you asked Jesse C. . For the first question, I have to try to interpret what Jesse was trying to say, I believe that adding “true” as a qualitatif of understanding is misleading. According to Searle’s CRA (reliant on Searle’s Periscope), a T2 passing machine does not understand. Searle’s able to attest that because “it feels like something to understand”, and in the case of C=C, implementation-independance allows him to “generalize” his results to other hardware. Now Searle showed that cognition can’t only be computation. If T2 could be passed from pure computations (without cheating), there would be no understanding from passing T2. This brings us to T2 requiring T3, bringing me to your second question. From my understanding, if Searle grounded the content-words of Chinese, it woudld lead him to understand. Whether he grounded the chinese symbols (content words) directly through direct sensorimotor interactions (i.e. T3, hence relating to “T2 needs T3”), or through indirect verbal grounding. Now because Searle already has directly grounded symbols in English, I am assuming he wouldn’t require any direct symbol grounding for Chinese and could “learn” Chinese and understanding the symbols by pure indirect verbal grounding from english verbal explanations (this wouldn’t be the case if the verbal explanations were in Chinese, because the verbal explanations need to be grounded). Am I wrong, or is it true that we only require one set of “core content words”, in one single language, to ground the remaining words of that language + all the words of all other languages indirectly (verbally, using the set of core content words)?
DeleteHarnad (2003) explains that the symbol grounding problem is the question of how meaningless symbols like “teddy bear” get their meaning. A “teddy bear” is just a meaningless symbol until it is connected to our sensorimotor experiences of teddy bears (ex. Brown, fluffy, soft, huggable). He suggests this can be solved by tying symbols to the sensorimotor experiences we can perceive and act on in the real world. But even when symbols are grounded, there are limitations. Grounding is a necessary condition for meaningfulness but not necessarily a sufficient one. Cognitive science could reverse engineer how a person is able to identify a teddy bear in the real world in theory. However, whether grounding also entails consciousness (what it feels like to understand something is a teddy bear) is not something we can ever really figure out.
ReplyDeleteAnnabelle, good comments, but can you answer any of the questions I asked Emily, Ayla, and Jesse C?
DeleteTo answer the question posed above about “what do you mean by understanding?” in Jesse’s post, I believe the question lies in which level of the Turing Test we take as a standard for cognition, where the core issue is the symbol grounding problem (Harnad, 2003). Symbols can be shuffled around by rules, but without grounding they’re meaningless — a machine could say “the cat is on the mat” without knowing what a cat or a mat actually is. That’s why T2 (text-only) isn’t enough. It’s usually seen as pure computationalism (just symbol manipulation), though as we discussed in class, it doesn’t have to be limited to that. T3 takes a step further by grounding words in sensorimotor experience, giving symbols their first real sensory meaning. Still, T3 only gets partial understanding — no higher-order reasoning, self-reflection, or abstract thought. T4 adds those capacities on top of grounding, making it the closest thing to full humanlike cognition. So T3 is necessary but not sufficient for understanding: without grounding, nothing means anything, but without T4 you don’t get the full richness of cognition.
ReplyDeleteRachel, what is a category?
DeleteAnd what is abstraction?
Isn’t all categorization based on detecting and abstracting distinguishing features, whether from direct sensorimotor trial and error or indirect verbal description?
Yes, the brain can do it all, but how does T4-passing-capacity (which includes T2/T3-passing capacity) allow “higher-order reasoning, self-reflection, or abstract thought” that T2/T3-passing capacity does not?
Rachel! I really liked how you broke down the different Turing Test levels and tied them back to grounding, it made the distinctions between T2, T3, and T4 much clearer. I think your point that T3 gives us partial understanding is key, since it shows that grounding through sensorimotor experience is necessary but not the whole story.
DeleteBuilding the follow up question, maybe what separates T4 is its ability to abstract from categories and not just grounding cat and mat in direct experience, but reasoning about “animals” in general, or even reflecting on what it means to know something. That higher order layer seems less about raw sensorimotor input and more about combining grounded categories into abstract thought.
The T4-passing capacity achieves higher-order reasoning, self-reflection, or abstract thought because it entails the causal substrates underlying human cognition. These higher cognition processes are instantiated, not merely mimicked. To echo my previous example with sleep k-complexes peaking when personally/emotionally relevant information is mentioned when someone is asleep, a T4 machine's internal mechanism would be expected to elicit the same internal response. The difference between T4 and T2/T3 is fairly simple, but we encounter the same problem we encounter when we ask the “how” human cognition question when dealing with T4. We can theorize that T4 would be like us - truly like us - but we can’t say how beyond “causal intrinsic mechanisms” because we don’t even know how to capture them within humans.
Delete“...the nature of "higher" brain function is itself a theoretical matter. To "constrain" a cognitive theory to account for behavior in a brainlike way is hence premature in two respects: (1) It is far from clear yet what "brainlike" means, and (2) we are far from having accounted for a lifesize chunk of behavior yet, even without added constraints.” – Harnad 1990
DeleteThough this paper was written in 1990 and we have learned much about the brain since then, I still think it is too strong and somewhat unfounded an argument to say that T4-passing-capacity allows for “higher-order reasoning, self-reflection, or abstract thought” but T3-passing-capacity would not. Perhaps it is indeed true, but I see no strong reason as of yet to assume so given our current understanding. As Harnad explained, a T3-passing robot would be able to ground categories in sensorimotor experience, similarly to us humans. While it is possible this grounding ability may not necessarily be sufficient for the human-like sentience implied by “higher-order reasoning, self-reflection, or abstract thought”, it is unclear what the missing ingredient would be. It seems to me that based on our current understanding, we do not know enough about the emergence of such capacities to conclusively assert that only a being with the same internal design as us (i.e. T4-passing) could possess them.
If I only knew the rules for moving Chinese characters around without understanding them (like in Searle’s Chinese Room thought experiment), I wouldn’t actually know what they meant. What makes the words in my head meaningful, while the same words on a page, or inside a computer, remain meaningless and empty unless someone can interpret them?
ReplyDeleteThe idea of ‘grounding’ is key, for words to be meaningful, they need to connect directly to the world, not just the other words they're surrounded by. That’s why definitions eventually bottom out in sensorimotor experience, for example, I know what ‘red’ means because I can see it, not because I’ve read an endless chain of dictionary definitions of the word red. This is why some argue that a system needs both symbols and the ability to interact with the world in order for symbols to actually be grounded. A robot can categorize objects but only because the humans that created that system had sensorimotor experiences and could program them to.
Where does consciousness come in? Even if a robot or system could ground its words through interaction and some sort of sensorimotor experience, would that guarantee it experiences meaning - or would it just be going through the motions? How can we tell if symbol grounding is enough to ensure meaning, or maybe consciousness adds something that we can't totally explain.
Lauren, symbol-grounding (T3) certainly does not guarantee or explain sentience. The Hard Problem of explaining feeling-capacity is so hard that symbol-grounding cannot even be said to be necessary for sentience, let alone sufficient for it.
DeleteThe only thing that T3 symbol grounding guarantees is the connection between content-words and their referents in the world. That is still only a robotic connection. Explaining whether, and how, and why it is a sentient robotic connection rather than a “zombie” robotic connection goes beyond the Turing Program of reverse-engineering and Turing-Testing (and Turing explicitly acknowledges it).
“Meaning” and “understanding,” are, at best, beyond the scope of the Easy Problem (of explaining doing-capacity), inasmuch as they refer to felt (rather than just “done”) meaning and understanding.
From my understanding, symbol grounding is what prevents symbols from being empty labels and gives them real reference. The rules of how symbols are connected to its referent would be taught to us as children. If I were trying to explain this to Kid-Sib (basically myself in this complicated context) I would say that the language you are taught, that being the English words and their referent or meaning, is the basic form of symbol grounding. We learn language by associating a word with something in the world, that we can experience from one or more of our senses. From there, you can continue to ground symbols by using that basis. Is this the same for computers? If you make a computer and teach it one “language” or one basic set of rules referring to words and their referent, can the computer learn more from that stand point?
ReplyDeleteKaelyn, language is certainly learned, but it’s not clear to what extent it is taught. We get some verbal instruction in grammar, not unlike we get instruction in reading, writing, swimming and maths. But language started long before formal education; its groundwork originated before language, and that nonverbal learning occurs in human children as well as in many nonhuman species. Our species seems to have evolved a unique, specific capacity to learn language with very little explicit instruction,
DeleteOnce you’ve learned a category (“what to do with what”) directly, through sensorimotor trial and error, it’s trivial to assign it an arbitrary label, especially if your kin and kind are using the same label.
Propositionality [what is that?] is another matter, unique to our species, and related to a new way of learning categories that is unique to our species: by word of mouth. But that is learning through language rather than the learning of language itself.
If, by teaching computers language, you mean LLMs like ChatGPT: They are not really taught language; they are just trained to predict the next word in an enormous “Big-Gulp” database of words produced by grounded humans. LLMs learn to produce words that make sense to grounded human speakers: They can even provide explicit verbal instructions on grammar and vocabulary, using their grounded database. This is amazing, but it is not a result of having been having taught language by verbal instruction.
The symbol grounding problem is the challenge of how abstract symbols get their meanings. The readings suggest it can be solved with grounding in sensorimotor categories.
ReplyDeleteThe symbol system would need to be augmented to a T3 system with nonsymbolic sensorimotor capacities to interact with external objects. This capacity is necessary because picking the external object, the “referent,” of a symbol requires physical, causal interaction with the environment. Elementary symbols, like our words, are grounded by linking them to internal representations, iconic/categorical.
This is interesting to me because LLMs like ChatGPT seem convincing in understanding language. For instance, it can provide an accurate description of the real-world animal when asked “What is a cat?” However, the symbol grounding problem indicates that this isn’t the case. Without sensorimotor capabilities, ChatGPT can’t ground its symbols to external referents. Rather, it can only manipulate symbols and connect them to other abstract symbols.
Adelka, symbols are not “abstract”, they are arbitrary (with respect to what?). It is categories that are abstract (how?).
DeleteWhy does symbol-grounding require physical interaction with the referent?
Can you update the weasel-word “representation” used in 1990 to what we would say now?
From what I understood of the symbol grounding problem, symbols are arbitrary because they are just shapes or sounds. They do not have a particular connection to what they actually refer to, like the word cat not resembling an actual cat at all, the connection between these three letters and the actual animal are taught to us. Categories are abstract because they are generalizations from several experiences rather than just having a single concrete characteristic that everything in that category falls into.
DeleteAs for why symbol-grounding requires physical interaction with the referent, without actually having the ability to interact with the real world, we get caught in an endless loop of defining something through definitions without a real anchor for true comprehension. We cannot know what a cat is compared to dogs until we see one or touch one or hear one. This ties the symbol to the category too. We can't categorize if we do not have an understanding of what the symbol means, leaving our definition completely arbitrary.
A key nuance Harnad (2003) highlights is that “symbol grounding must be sensorimotor to avoid infinite regress”. He explains that symbols (words) must be grounded through sensorimotor functions, which allows robots to detect, perceive, and act on their referents (the things words refer to in the world). A key part of this process involves forming categories that separate members from non-members based on their physical characteristics, which can either be innate, acquired through trial/error with feedback, or learned through verbal definitions/descriptions (for humans). However, from my understanding, we cannot learn new categories by relying solely on definitions if the words in those definitions are not already grounded in our sensorimotor experiences. For instance, to understand what a tree is when defined as an “elongated trunk with supporting branches and leaves,” we need to know the real-life referents of the words “trunk,” “branch,” and “leaves,” and be able to tell it apart from non-treelike entities like poles.
ReplyDeleteGabriel, very good grasp. Can you explain to kid-sib the relation between direct and indirect grounding? (key: feature-detection)
DeleteHere is my attempt at a kid-sib relation between direct and indirect grounding:
DeleteDirect grounding is when you learn a word by detecting the features yourself through experience. For example, you learn what “apple” means by seeing, touching, and tasting apples, and noticing they are round, red, and edible. You could then make the distinction that an apple is a fruit by correlating the attributes that you just learned about an apple (like its edible-ness and that it has seeds) to what you know constitutes a fruit (assuming you know what constitutes a fruit). Indirect grounding is when someone tells you the features: “An apple is a round, red fruit.” But that only works if you already know what “round,” “red,” and “fruit” mean from direct grounding. Otherwise, the new word won’t stick.
I’m really fascinated by the concept that you can use symbols to ground other symbols (indirect grounding), which of course makes sense. When you have certain symbols grounded to referents, you can use them to describe others, like a dictionary. I’m curious to know what the bare minimum amount of grounded symbols is that could be used to create a basic understanding of the world. In essence, how much would T3 have to experience the world to pass the test? Which bare minimum sensorimotor capacities could be used to pass? Would it have to experience everything we do, or could it make shortcuts?
DeleteI agree that direct vs Indirect grounding is a neat idea. My only question is whether these methods are equally effective. Further, are there levels to grounding? Is 'apple' more grounded for me than a fruit extensively described using grounded content-words? If grounding is when you connect symbols to their referent, does this even really occur during indirect grounding? When described this mystery fruit, is our referent the fruit itself or its description?
DeleteI took note of when this paper was initially published (1990) and I’m quite impressed that Dr. Harnad was, in a way, able to anticipate how the LLMs today face this very problem of ungrounded symbols that he articulated. These LLMs manipulate symbols in highly complex ways, but they’re trained merely textually, and thus, reflect only powerful mimicry. (This is analogous to his Chinese-to-Chinese dictionary example used in the text: impressive output, no intrinsic grasp of meaning). His “bottom-up grounding” is reflected in the direction that modern AI is moving towards as well: systems that combine text, visual feedback, and interaction with the real world have begun to emerge as this next step towards intelligent machines. Further, his desire to implement a “Total Turing Test” that requires being able to engage in this aspect of sensiromotor-real-world interaction is also very relevant; I do think it's really quite remarkable that a paper written over 30 years ago was able to so clearly predict the challenges AI research faces today and the direction future research seems to be heading towards.
ReplyDeleteElle,
DeleteI also found it interesting how forward-looking Professor Harnad’s whole framing of the symbol grounding problem feels today. As you mentioned, his “dictionary-go-round” analogy is basically what LLMs today embody: big textual correlations without any intrinsic meaning. The follow-up from 2007 only highlights the difference between them further; Professor Harnad distinguishes between grounding being a functional issue (linking symbols and sensorimotor categories) while meaning is a felt issue, something tied with consciousness. This tension seems to still be unresolved. On the one hand, multimodal AI seems to be pursuing the idea of "bottom-up grounding," linking text with vision and action. On the other hand, if an AI could pass the Total Turing Test, Professor Harnad would say this does not necessarily grant the "meaning" of the experiential kind. I think his foresight strongly recommends his work as central to the idea of grounding; it forces us to think about not just whether machines can use symbols but whether they can ever understand them in the way we do.
The symbol grounding problem is basically the issue of how symbols (like words or mental representations) get their meaning, instead of just being manipulated as empty shapes. Computation on its own only shuffles symbols according to formal rules, but it doesn’t explain how those symbols connect to the things they’re supposed to refer to in the real world. That’s why a unilingual dictionary example is so powerful.. if you don’t already know at least some grounded words, you’ll just loop through definitions without ever reaching meaning. What stood out to me is that grounding requires connecting symbols to sensorimotor experience, so that a system can actually do something with the things the symbols refer to. This connects back to our discussions on cognition and computation; even if a system manipulates symbols flawlessly, without grounding, it’s not really understanding per se. So my question is: if robots could ground their symbols through perception and interaction, would that be enough for meaning, or is the conscious “feeling” of understanding still important?
ReplyDeleteShireen, that’s a great question! As you mentioned, the symbol grounding problem suggests that symbols (such as words) acquire meaning through sensorimotor experiences – grounding them in perception, action, and shared context. If robots could ground their symbols in this way, they would achieve what Professor Harnad calls functional meaning, the capacity to use symbols effectively in the physical world. But as he points out, “grounding is a functional matter; feeling is a felt matter”, which marks the distinction between the symbol grounding problem and the mind/body problem. Therefore, “real” meaning would not only require grounding but also the state of feeling. Without this felt dimension, robots like T3 would remain at the level of functional grounding, never reaching conscious meaning.
DeleteThe grounding problem is quite interesting and never thought of it as also exploring the link between mental states and meaning. It seems probable that we get a basic vocabulary from our environment, the physical world and culture. For example, we learn what an apple is by our senses and what kindness looks like in relationships we observe. From there, I think that our mental capabilities, notably our imagination, could help us conceive of abstract and even hypothetical concepts. Now, when it comes to the meaning related to mental states, it seems that it is just another way of framing the Hard Problem without getting any closer to a solution.
ReplyDeleteThat's an interesting point about grounding, especially when we think about how meaning traces back through history. Studying ancient civilizations and early languages shows how many symbols, like words for natural elements, family roles, or tools, were directly tied to people’s lived experiences. These early grounded symbols formed the roots of more abstract concepts that evolved later. By examining the origins and use of language across time, we can see how meaning builds from sensorimotor experience into cultural and conceptual complexity. It also helps explain how shared understanding develops socially, not just individually, grounding symbols within collective human activity.
DeleteSomeone can ground “zebra” because they already have a set of elementary symbols including “stripes” and “horse”. Elementary symbols are the “names” attributed to the otherwise nonsymbolic iconic and categorical representations that can pick out referents because they are grounded through direct sensorimotor interactions with the world. Representations alone are nonsymbolic because they are arbitrary. For example, “horse” and “cheval” are different “names”, but they pick out the same category of objects. The symbols “horse” and “stripes” when combined into a proposition, can be interepreted semantically as a symbolic description. I find fascinating that human languages have different category names for objects and categories (which I believe might be based on environment). Languages can express everything but some of them have specific words while others can only be expressed in proposition. Do we all have the same categories? If I knew about a zebra first, would I ground it directly instead of indirectly grounding with “stripes” and “horse”? If I knew “zebra”, “colours” and “stripes” as elementary symbols and saw a horse would I refer to it with a proposition like “a zebra with no stripes and not white” (“zebra” + “no stripes” + “not white”) instead of a word or would I end up revisiting all my categories?
ReplyDeleteI like your point about how categories depend on what’s already grounded. If you learned “zebra” first, then “horse” might indeed be grounded indirectly as “a zebra without stripes.” But eventually, direct sensorimotor exposure to horses would refine the category so it no longer relies on negation. This highlights how grounding is dynamic: we build categories from what we already know, but experience can reshape them. Does this flexibility of grounding suggest that categories are universal, or do they vary significantly across different environments and languages?
DeleteJust a thought vaguely inspired by Aristotle's notion of a product of imagination being a compounding of previous objects of sensation: I think it is possible to say that it is not that categories are built from what we already know and are refined by sensorimotor exposure, but rather that categories become complexified and further inter-connected through exposure. Indeed, if we take the examples of "zebra" and "horse", a canadian child's learning of these categories will likely be [exposure to horse-> creation of the category of horse], and then [exposure to zebra -> what is this -> a horse with stripes that lives on the African continent]. But I do not think that this process ends here. At some point, the child will, yes, learn to refine their understanding of a zebra, and then that these two animals are part of the same genus (they are both equines). There will therefore be two (or more) categories (horse and zebra) subsumed under the same meta-category (equines). These categories are in this way complexified.
DeleteAs to your point that categories are universal: could we suggest that some of them are (those associated with facts that are observable in nature, e.g. zebras and horses are two different categories, but they belong to the same genus/meta-category) but others are not? In this latter group, I would as an example mention types of music or names of objects.
Sensorimotor abilities allow a machine to ground the symbols it manipulates into tangible objects it interacts with through bottom-up-grounding. In this regard, symbol-grounding may allow for meaning acquisition of symbols, as a link is made between the arbitrary symbol, and the thing it represent, and putatively, this link is the meaning: what connects it to the real world. Otherwise, symbols are devoid of meaning by themselves. The symbol-grounding problem does not stipulate that the real-life objects are devoid of symbols, but rather the opposite. There is no ‘meaning-linking’ problem, per se. However, in twins, it oftentimes is the case that a rudimentary, invented language arises, only understandable by both twins (Bakker, 1987). In this regard, I wonder whether the symbol-grounding problem even is the right problem and question to ask ourselves in trying to figure out the origin of meaning acquisition. Infant twins interact with the world, possess sensorimotor capabilities before they do language capabilities. As such, the need to describe reality appears before the conventional tool to describe it, language, is learned. This could be what pushes twins to develop these invented languages: they know the meaning of certain things, but this meaning is orphan, and cannot be described to others. Perhaps the symbols themselves do not get grounded onto meaning, but instead act as the link through which meaning gets expressed.
ReplyDeleteThat’s a really interesting take. I like how you connect twin languages to the grounding issue, it makes me think about whether we’ve been framing the problem too narrowly. Maybe meaning doesn’t get “poured into” symbols so much as symbols emerge as tools to express meanings we already have through sensorimotor interaction. In that sense, grounding might not be about attaching labels to objects but about finding a shareable system for private experience. The twin case shows how meaning can exist before conventional language, and symbols just provide the bridge. It reframes grounding as communication rather than pure representation.
Delete“Is Google Image Search (GIS) grounded?” I once gave GIS a picture of my houseplant to test whether it could correctly identify my pothos, which it did. I then tested GIS on my other plants and again, it was right every time. This suggests that GIS can not only identify a plant as one but can also categorize plants into different kinds. That is, when given a unique picture of a specific plant, it knows what it has to do with it (naming the plant) based on the features that it detects (e.g. the shape of the leaves, the presence of various colors, etc.). As such, GIS is grounded in the sense that it can use sensory capacities (through the camera lens) to pick out a referent (the type of plant) from its symbols (the pixels in the picture). However, it cannot interact with the plant in the physical world, so it would not be entirely grounded, in my opinion.
ReplyDeleteThis is really interesting, as it seems to suggest that grounding is more complex than simply sensorimotor interaction with the environment. Could we suggest that GIS is not at all grounded, simply basing ourselves on the fact that it has not acquired the knowledge of what the plant is through sensorimotor interaction with the plant, but rather through a computational process? There is, I think, a difference between learning what a fern is on a nature walk at age 5 (in a bottom-up fashion) and learning what a fern is through analysis of input and output in a top-down model that is greater than simply sensorimotor interaction with the object. So I do not think that that this accurate analysis of the image is sufficient to say that GIS is grounded, because its identification of the fern is based exclusively on a top-down process, which does not stem from a bottom-up learning process.
DeleteThe idea of the symbol grounding problem, and particularly the Chinese-Chinese dictionary analogy, reminded me of the logical positivist approach to science, which asserts that any scientific statement must consist only of observable, measurable terms and logical connecting words. For example, the term ‘bachelor’ is meaningless, but replacing that word with ‘unmarried man’ gives it meaning. This solution is not viable for the symbol grounding problem, as replacing words with more words becomes an infinite regress, but it does connect that the grounding of a symbol/word requires us to interact with its referent. We need to physically interact (by observing or measuring) with a notion to ground it, and we can use these grounded symbols to build more complex ones.
ReplyDeleteIn today’s class we discussed the two methods of escaping the dictionary problem:
ReplyDelete- Dictionary route: define words in terms of other words
- Direct grounding: tie words to sensory actions and experiences
The big question was: How many words actually need to be grounded in direct experience before the rest can be built from them?
To my surprise, the number is tiny. Apparently, just 500–1500 grounded words could serve as the foundation for understanding everything else in the dictionary. Even more surprising, there isn’t one magical set of words—many different combinations could work.
We also talked about efficient versus inefficient grounding words. A word like pterodactyl is too narrow to help much, while words tied to basic sensory categories (red, sweet, loud) or actions (eat, run, cut) are efficient because they combine flexibly to describe many other things.
But here’s where I’m stuck: I’m not sure I buy it completely. Imagine I had not just 1500 grounded words, but 3000, or even more. Would that really be enough to understand a word like cellophane? It feels like without directly experiencing it—seeing its texture, crinkling it in my hands—I wouldn’t really “get it.”
Which makes me wonder: What do we even mean by understanding? Is it enough to be able to categorize correctly—to “do the right thing with the right kind of thing”? Or does real understanding require something more?
I get the idea of the “tiny set” of grounding words, but I’m still not fully convinced. Sure, if I know grounded words like transparent, plastic, shiny, and wrapping, I can piece together a decent definition of cellophane without ever touching it. But would that really count as understanding? Maybe it’s enough for functional purposes, to categorize correctly and use the word in context. Still, it feels like there’s a gap between that and the richer kind of understanding that comes from actually crinkling cellophane in my hands. Maybe real understanding always needs some degree of direct embodiment.
DeleteThe symbol grounding problem highlights a fundamental challenge in cognitive science: symbols alone cannot generate meaning, since definitions endlessly refer to other symbols, like learning Chinese from a Chinese-only dictionary. This paper proposes a hybrid solution, grounding symbols in two nonsymbolic layers: iconic representations, which mirror sensory input, and categorical representations, which extract invariant features for identification. Connectionist networks then learn these features, enabling symbols to connect to the world. Higher-order symbolic reasoning emerges from these grounded basics. Anticipating modern debates in neurosymbolic AI, this framework suggests that meaning arises from integrating bottom-up perception with top-down symbolic composition.
ReplyDeleteI think that even if grounded content-words were understood by Searle in the Chinese Room, he still wouldn’t pass T3, despite having sensorimotor capacities (as is required for symbol grounding). If you had the sentence “The dog bit the man”, and understood the content word “dog”, “bit”, and “man”, but didn’t speak the language - the machine still wouldn’t be able to ‘understand’ what the sentence means - it’s real-world experience that is telling us that the dog bit the man and the man (probably) did not bite the dog.
ReplyDeleteAside: since “Picking out referents is a dynamic (implementation-dependent) property,”, if the machine needs to pick out referents it is evidently not compatible with computationalism.
Hey Emma! I think the robot that Searle was operating could pass the T3 test if my following reflection is correct. The requirement for a T3 passing robot is to have the performance capacity of a T2 passing machine with the ability to ground its elementary symbols directly, through its sensorimotor performance capacity. From then, it should be capable of grounding of its higher-order symbolic representations in the elementary symbols. Searle, being integrated in that T3 robot’s system, would have a “robotic” understanding of Chinese. As stated in a previous response: “The only thing that T3 symbol grounding guarantees is the connection between content-words and their referents in the world. That is still only a robotic connection. Explaining whether, and how, and why it is a sentient robotic connection rather than a “zombie” robotic connection goes beyond the Turing Program of reverse-engineering…” Thus, I think where the confusion arises is that you may be referring to “sentient” understanding of Chinese on the part of Searle. It is true that human sentience allows us to make sense of propositions intrinsically (which is why you said we would know that most likely the dog bit the man and not the other way around). However, I believe that in the case of what is relevant to cognitive science and the question of reverse-engineering cognition, the problem of sentient understanding does not need to be (and can’t be) solved. Additionally, we progress with the assumption that the propositions are true (i.e., if the proposition is “the man bit the dog”, then the man truly did bite the dog).
DeleteWhat struck me about the symbol grounding problem is that it exposes a gap not only in machines but also in how we think about human language. We assume our words have meaning, but even for us, most words are learned through a web of other words. The reason this doesn’t collapse into circularity is that some of our vocabulary bottoms out in direct experience things we can see, hear, or feel. That makes me wonder whether grounding is less about building an entire system from the ground up, and more about maintaining enough contact points with reality to keep the whole symbolic network stable. Could meaning, then, be more about how many anchors we have than about having a single foundation?
ReplyDeleteThe symbol grounding problem raises the question of how symbols gain meaning beyond formal manipulation. From what I understand, grounding is essential because it connects language to the world through sensorimotor experience. Without such grounding, symbols remain unanchored. Their relationships to one another may form consistent structures, but they lack connection to actual referents. This explains why purely computational systems, like LLMs, can appear to “understand” language while in reality operating within a closed loop of ungrounded symbols. Grounding provides the missing causal link between words and their referent categories by enabling feature detection through direct interaction. Indirect grounding through verbal definitions then becomes possible only once the constituent words in the definition are themselves already grounded. This recursive structure ensures that meaning ultimately traces back to real-world perception and action rather than remaining suspended in abstraction.
ReplyDelete“How can you ever get off the symbol/symbol merry-go-round? How is symbol meaning to be grounded in something other than just more meaningless symbols? This is the symbol grounding problem.”
ReplyDeleteI think this passage perfectly captures the essence of the symbol grounding problem. If words only refer to other words, and if every symbol just points to another symbol, meaning never really lands anywhere real. Professor Harnad describes it as a “symbol merry-go-round,” and I believe that this metaphor showcases how definitions can keep looping without grounding. We said in class that except in mathematics, definitions are not exact. They’re always a bit defective, approximate, and circular. From what I understand, that’s what makes the grounding problem so important. It reminds us that understanding can’t just come from more symbols, at some point, it has to connect to experience, perception, or action. if we ground symbols in direct sensory experience, then words like apple start to gain meaning (through seeing, touching, or tasting), giving language something solid to build on.
In class we were asked to try and solve the Zeno's paradox where a turtle is given a headstart and then Achilles runs after it. How can he ever catch up if in order to catch up to he first needs to get halfway to it,but before that halfway to the halfway and so forth infinitely? This is my attempt to solve it in a Kidsib-way
ReplyDeleteI think the first step to resolve this paradox is to move into the real world. The paradox describes moving as moving through a sequences of infinite smaller and smaller steps. This assumes that space/time is a line that that you can divide forever. However, this description seems to exist only through the lens of language's tendency to approximate the real world. In reality, motion is a continuous action with time always moving forward not stopping for those halfway points. Since time, speed and space are all related to each other (speed= distance/time), and time is always moving forward, you can say that by time X, Achilles will have reached the tortoise regardless of space (which as far as I know could also have smaller indivisible parts). Basically, I believe that the problem only arises when explaining motion using abstract descriptions.
I also decided to ask gpt for an concise simple answer to solve the paradox and see how it differs from my attempt. Here is what GPT answered:
DeleteZeno’s Achilles and the tortoise paradox suggests Achilles can never overtake the tortoise because he must first reach an infinite number of halfway points. The resolution lies in understanding that an infinite sequence of smaller and smaller steps can still add up to a finite time and distance. In modern terms, the times Achilles takes to reach each point form a geometric series—for example, 10 s, 1 s, 0.1 s, 0.01 s, and so on. This infinite series converges to a finite total (about 11.11 s in a typical setup), meaning Achilles overtakes the tortoise after a finite amount of time.
Zeno’s mistake was treating infinite divisibility as implying impossibility of completion. Mathematics—specifically limits and calculus—shows that continuous motion naturally involves infinitely divisible distances, but the sum of all those infinitesimal parts can still be finite. The paradox therefore exposes not a flaw in motion itself, but in our conceptual model of infinity. Motion, time, and space remain continuous and finite in practice; only our description breaks them into infinitely many parts.
It seems that gpt is taking an angle that I didn't really think about which is pure math. We do take a similar angle when it says that it exposes a flaw in our concept of infinity and not with infinity itself and when talk about time. However, gpt answer to me seems to try and sidestep the argument by burying under math which doesn't seem to work for me. Because in reality, you are still taking those infinite steps even if they do equate to a finite amount. However, it could very well be gpt having a more thorough point that i don't understand because I asked it to be concise.
Because I think they are a bit related, this is my attempt at explaning how to solve the chinese dictionary problem in a kid-sib way.
DeleteImagine you have a dictionary entirely written in chinese, so whenever you try to look up a word you end up having to look up another word since you don't understand chinese. So, you just end up looking through every page forever. Similarly to the zeno paradox above, you resolve it by moving this to the real physical world. The problem is that you are stuck in a world of symbols which you don't understand. The way to fix this is by grounding some of those symbols in reality by getting you familiar with the features of what they represent. For example, you learn the symbol for cat by learning the features of a cat when you interact with a number of them in the real world (for example by seeing them or petting them). Now, whenever you see the symbol for cat in a definition, you will know what it is as well as the other words you would use to describe cat. Do this for enough words and you will eventually be able to look up what any word means in the dictionary.
I also asked gpt to solve it in a simple way but our answers are too similar to point out anything interesting (maybe because we had discussed t3 and the symbol grounding problem before). Here is what it said for anyone interested
Imagine you find a giant Chinese dictionary, but every word is explained using other Chinese words. If you don’t know any Chinese, you’ll just keep flipping pages forever without learning what anything means. That’s the problem — words can’t explain themselves if they only point to other words.
To fix it, some words need to connect to real things. You learn “dog” by seeing a dog, hearing it bark, and petting it. Once you know that, you can understand new words like “animal” or “pet.”
So, to really understand language, some words must start with real experiences — not just symbols on a page. Without that connection to the world, a dictionary (or even an AI) would just be juggling empty words with no meaning behind them.
The symbol grounding problem presents a striking gap: even the most sophisticated AI or formal systems, when ungrounded, operate like a Chinese/Chinese dictionary and thus never reaching intrinsic understanding. Harnad's hybrid model proposes bottom-up grounding through sensory and categorical representations. Yet, reading the article alongside the class discussion, I realized how much the model hinges not just on grounding symbols, but enabling interaction with the world.
ReplyDeleteI believe that true grounding must also account for the ways systems adapt through interactive learning. Real-world categories shift as contexts, goals, and feedback evolve. For instance, humans may collectively revise the grounding of a concept (like “pandemic” or “cloud computing”) based on events or technology. If AI systems only ground symbols in fixed categories, they may fall short of human-like understanding. One way to bring us closer to authentic meaning would be by enriching Harnad’s framework with interactive, socially-driven revision of grounded symbols.
While reading Professor Harnad's paper, I realised that we, humans, frequently participate in a kind of mini-Turing Test for grounded cognition quite frequently: when we complete CAPTCHAs online. These are, in effect, practical demonstrations for the symbol grounding problem: they assume that recognizing a “bike” requires perceptual grounding — the ability to connect a symbol to real-world experience. Humans can prove to websites of their "humanness” as they can identify objects because our understanding of them is shaped by perception and context, while traditional bots fail because they manipulate symbols without truly interpreting them. Ironically, AI vision models are now improving at linking language to images and can sometimes pass CAPTCHAs. This raises an interesting question: as AI becomes more perceptually capable, will we need new forms of CAPTCHAs that test deeper, more experiential kinds of grounding, perhaps even something closer to feeling?
ReplyDeleteThe idea of Symbol Grounding Problem changed the way I think about computers and language. The system completely falls apart if you define "thinking" as a symbol manipulation based on the shape of the input because there is no definite way to give any of those symbols real meaning. This is where the team "Chinese Dictionary-Go-Around" comes from, where each symbol is defined solely by a string of other symbols and thus never anchored to anything. Symbols become a part of circular definitions, proving bthat pure symbol-processing only has syntax and lack semantics.
ReplyDeleteIn order to avoid this circular loop, we need to attach intrisic meaning to these symbols such as using our sensorimoto experience. That is how the mind could categorize concepts and create meaning of symbols. Hearsay, on the other hand, is a kind of learning that allows us to shortcyt the hands-on-learning
Most grounding work treats meaning as stable or fixed once a model has matched enough examples to a given symbol. Floridi and Taddeo point out that this assumes semantics are already in place, which is problematic because this violates the zero-semantical-commitment condition—essentially, a system that starts with fixed meanings can’t explain how meaning forms at all. A grounded concept, though, should change when experience contradicts it or requires disaggregation. If a model in the real world is outputting failed predictions, it should need to revise what its words are referring to. If a system learns what a “cup” is, externally, only from language, it might treat a “bucket” as the same since both are containers for liquid. But if it actually tries to use them and notices that a bucket behaves differently, it has no option but to adjust its category. That correction means the concept “cup” is now grounded in the outcome of real interaction, not just in how often words co-occur. Testing that requires behaviour, not text (where GPT can deceive us). T3-level agents that can act on its categories could be tracked when its feedback forces it to redraw boundaries; if it's capable of reorganizing its vocabulary, those corrections and generalization to new cases without retraining, might that be evidence of grounding? If it only updates correlations, it doesn’t meet the zero-commitment standard, it’s still predicting, not understanding.
ReplyDeleteWhen reading through this article, an example about symbolic representations confused me. If it is possible for someone to recognize a zebra solely based on a symbolic definition, what does that mean about the nature of our knowledge and the way we learn? A lot of what we know comes from symbolic communication passed down generations, like our knowledge of atoms, historical events and even God. Does this mean that symbolic systems, like language or even artificial intelligence, can really represent and even extend knowledge beyond what has been gained through experience? Or, as I am more inclined, does our understanding depend on someone having originally grounded those symbols in real perception some time ago?
ReplyDeleteI have been challenging my own "understanding" and what it really truly means to understand. Harnad in this paper argues that words (or symbols) only gain meaning when they're grounded in sensorimotor experience, when the system using them can connect them to things in the real world. I get that, but it also made me wonder if grounding is really enough? If a robot could perfectly link the word “apple” to real apples through perception and action, does that mean it knows what an apple is, or is it still just matching patterns? It also made me think of the mirror neurons we talked about in the prior week how they don't necessarily mean exact understanding and how Fodor said meanign cannot just be neural activity. Harnad suggest that grounding solves that gap but I am not sure if it does. Maybe meaning isn’t only about connection to the world, but about how that connection feels from the inside? If grounding explains use, what explains experience? Can something be meaningful without anyone there to experience the meaning?
ReplyDeleteDuru, you are right that there is more to meaning and understanding than just grounding. It also feels like something to mean something or understand something ("mirror" capacities). To reverse-engineer and explain grounding -- which is the capacity to learn to detect the distinguishing features of categories so you can do the right thing thing with them (e.g., the "mushroom island"), including to name and describe them (propositions, language) -- is the "Easy Problem" of cognitive science.
DeleteBut explaining how and why we (or any sentient organism) can feel is the "Hard Problem," which is beyond the reach of reverse-engineering and Turing-testing (T2, T3, T4). We will discuss why in Week 10. Do you have any hunches?)
The symbol grounding problem refers to how symbols lack intrinsic meaning without being grounded in real world referents. The problem can be solved with a candidate that can see and act; for example, we know the words ‘cup’ and ‘handle’ are grounded for a candidate if they are told they are the defining features of the object ‘mug’ and is able to correctly identify it in the physical world without ever having seen one before. To a candidate who has never had the capacity to acquire learned categories through seeing and doing, grounding is impossible because symbols can only refer to other symbols, without any world referents to break the loop. Identifying the correct features that capture the invariant features of one category apart from another is what gives intrinsic meaning to the symbols associated with them. In an ungrounded system, symbols are arbitrary because there is no connection between the shape of it and what it stands for. Therefore, candidates who don’t have sensorimotor capabilities are unable to empirically demonstrate they can ground symbols because they can’t construct new categories using already grounded ones.
ReplyDelete