Sida Liu

A Learner in the Complex World.


Embodiment and Abstraction in Artificial Intelligence: Building the Skyscraper of Intelligence

In recent years, embodiment has moved from the margins to the mainstream in artificial intelligence (AI), gaining traction in both academia and industry. Once a niche interest — championed by philosophers, enactivists, and a handful of forward-looking scientists — it is now widely seen as essential to the future of intelligent systems. Many argue that for machines to truly think and understand like humans, they must be embodied — grounded in a physical body that interacts with the world. Embodiment, while essential, is not the destination — it is the ground floor. It may provide a solid foundation for intelligence, but it does not, on its own, lead to the higher forms of mind we aspire to: deep understanding, long-range foresight, structured imagination, wisdom, even something approaching what we would consider omniscience. Like constructing a skyscraper, building intelligence requires a solid base, but also the architectural vision to rise far above it. Focusing solely on embodiment risks mistaking the groundwork for the structure itself.

Embodiment in the Physical World: The Ground

When practitioners and scientists speak of embodiment, the term spans a wide spectrum — from the basic (having a robot body), to the deeper (the idea that cognition emerges from bodily experience), and even deeper still: that learning and meaning arise through the self’s active participation in a world, whether physical, virtual, or even symbolic. Concepts like embodied cognition and enactivism argue that thinking doesn’t happen despite the body, but because of it.

In industry, however, embodiment is often embraced in its most basic sense: giving AI a body — a robotic shell with sensors and actuators. These systems are designed to replicate some aspects of the grounding that human cognition enjoys: learning cause and effect, understanding object permanence, interpreting gestures, and navigating physical and social environments. Yet this form of embodiment can only serve as part of the foundation. As with any skyscraper, a solid ground is just the beginning.

Large Language Models: The First Floor

Modern Large Language Models (LLMs) such as GPT, Claude, or Gemini represent a striking development in abstract cognition. Trained mainly on language — the compressed, symbolic distillation of human experience — LLMs are not directly connected to the physical world. Instead, they operate on the outputs of embodiment: the linguistic and cultural traces left behind by billions of human minds over time.

Maybe this reliance on abstracted inputs isn’t unique to language models. Most traditional deep learning systems — whether trained on text, images, audio, or videos — learn from human-curated datasets. These datasets are themselves the product of embodied human experience, but they are filtered, compressed, and stripped of context. In effect, such models are trained not on experience, but on the residue of experience. This approach can yield powerful pattern recognition, but it lacks direct grounding. Nuances of causality, physicality, intention, and context are often lost in translation — making it difficult for these systems to develop robust understanding of the world.

As a result, some researchers have turned their attention downward — seeking to reinforce the foundation through embodiment, hoping to ground these models more firmly in physical reality.

Reinforcing the Foundation

One important exception within traditional deep learning is Reinforcement Learning (RL), which enables agents to learn by interacting with an environment rather than directly absorbing human-curated datasets. This aligns closely with the deeper interpretation of embodiment – where learning and meaning emerge through the self’s active engagement in a world, whether physical, virtual, or even symbolic. RL brings us closer to this ideal by introducing agency, feedback, and situated learning.

However, in its current form, RL is often narrow and task-bounded, oriented around seeking predefined rewards in constrained environments. It typically lacks the openness, self-motivated participation that deeper embodiment implies. Despite these limitations, RL points in the right direction. It suggests that interactive, participatory learning can serve as a bridge between grounded experience and higher cognition – but to realize this, RL must be generalized beyond task success, toward systems that learn to live, not just win.

Higher Forms: Look Upward

We must be cautious not to treat embodiment as the final destination of intelligence development. The goal is not to dwell endlessly in the basement of sensorimotor learning. Just as human infants learn by engaging with the world but eventually develop language, logic, and abstract reasoning, AI must move upward — toward higher forms of cognition that transcend immediate experience.

Abstraction is where intelligence generalizes, plans, reflects, and theorizes. It’s where we find mathematics, science, ethics, and strategy. These functions are less about direct embodiment and more about symbolic manipulation, analogical thinking, and multi-domain reasoning.

To focus only on embodiment is to dig deeper into the ground. To focus only on abstraction is to risk building a tower without a stable base. True progress lies in the integration of both.

A Skyscraper Model of Intelligence

In this view, the development of AI resembles the construction of a skyscraper:

  • The ground is embodiment — the physical and interactive grounding that roots symbols in experience.
  • The first floor is what LLMs provide — abstract cognition built on the symbolic residue of embodied human culture.
  • The upper floors are still under construction — layers of higher intelligence, perhaps even forms we cannot yet imagine.

Each layer depends on the integrity of the one beneath, but no single layer is sufficient by itself. Building a powerful intelligence system requires us to both solidify the foundation through richer forms of embodiment and extend upward through more capable and general forms of abstract reasoning.

This metaphor captures a central tension in AI research: we must look down to anchor intelligence in the real world, but we must also look up to ensure it scales to the full complexity of thought.

Conclusion: Embodiment and Abstraction as Co-Architects

Embodiment is not the whole key to intelligence, but it is a vital structural element. Likewise, abstraction is not a luxury or a detour — it is the mechanism by which intelligence scales, reflects, and innovates. Large Language Models, while sometimes criticized for their disembodied nature, reveal how far abstraction alone can take us — and how much further we can go by building upward.

The future of AI lies in the integration of these dimensions. Embodiment helps build a solid and sensible base. Abstraction lets us design higher, broader, more flexible architectures of thought. To build intelligent machines — or to understand our own minds — we need both the earth beneath our feet and the sky above our heads.



Leave a comment