Blog

Learning From the World Versus Learning About the World

2026-03-12 · 8 min read

Why the direction of learning matters for intelligence

Three approaches to building more general artificial intelligence are now visible.

The first, and most commercially dominant, is the large language model paradigm: train massive neural networks on vast text corpora and rely on compositional generalization to handle novel situations. Intelligence, under this view, emerges from having absorbed enough structure that new problems can be addressed by recombining what was learned during training.

The second is the world model paradigm, most prominently associated with Yann LeCun and his Joint Embedding Predictive Architecture (JEPA). Here, the emphasis shifts from language to sensory reality. The goal is to build rich internal simulators of how the physical world works, models that predict what will happen next in abstract representation space, and then use those models to plan actions by mentally simulating their consequences.

The third approach, and the one developed across the preceding essays, begins from a different premise entirely. Rather than asking how much structure can we learn before deployment, it asks: what is the minimal architecture that allows a system to begin learning from real experience, and to accumulate understanding over time through interaction with a changing world?

These three approaches share the conviction that current systems are missing something important. They diverge on what that something is, and more fundamentally, on which direction learning should flow.

What World Models Get Right

LeCun's critique of large language models contains several observations that align closely with the arguments developed in the preceding essays.

He has argued consistently that autoregressive token prediction is insufficient for general intelligence. Language, in his view, is too compressed and too discrete to capture the richness of physical reality. A system trained only on text cannot develop the kind of intuitive physics, spatial reasoning, and causal understanding that biological organisms acquire through sensory experience.

His proposed alternative, JEPA, addresses this by learning to predict future states of the world in an abstract embedding space rather than in raw pixel space. This is an important architectural choice. By predicting at the level of abstract representations rather than sensory details, JEPA can focus on the structural regularities that matter for reasoning and planning while ignoring the irreducible noise of moment-to-moment perception.

LeCun's broader architecture also includes a cost module for evaluating states, an actor for planning, and a configurator that directs attention. These are components designed for consequence-aware behavior rather than passive pattern completion.

These ideas resonate with several themes from the preceding essays. The emphasis on prediction coupled to consequence rather than pattern completion. The recognition that intelligence requires models of how the world actually works, not just statistical associations in text. The insistence that planning and reasoning require internal simulation.

On these points, there is substantial agreement.

Where the Approaches Diverge

The disagreement is not about whether intelligence needs world models. It is about how those models should come into existence, and what happens to them after they do.

The world model paradigm: learn structure, then deploy

LeCun's approach, as articulated in his 2022 position paper and in the early work at AMI Labs, follows a recognizable pattern. The system is trained on video, audio, and sensor data to build a rich internal model of the world. This training phase is where the heavy lifting occurs. The model learns abstract representations of physical reality through self-supervised learning on large datasets of real-world observations.

Once this model is sufficiently powerful, it can be used to plan actions, predict consequences, and reason about novel situations. The internal simulator enables the system to mentally explore possible futures before committing to an action.

This is a significant advance over the LLM paradigm. The representations are grounded in physical reality rather than in language alone. The architecture is designed for planning rather than just completion.

But the fundamental sequence remains: learn a model of the world, then use it.

The model is built from a large dataset of observations. Its quality depends on the diversity and richness of the training data. Once deployed, the model applies what it learned during training. The question of what happens when the world changes, when the model's understanding becomes outdated, is not the central focus of the architecture.

The approach developed here: begin small, remain coupled

The approach developed across the preceding essays inverts the priority.

Instead of asking how to build the richest possible model before deployment, it asks: what is the minimal stateful architecture that can begin learning from experience, and that accumulates understanding through sustained interaction with a changing environment?

Under this view, the system does not need to arrive in the world already equipped with a comprehensive model. It needs to arrive with the capacity to form and revise models as experience accumulates.

The emphasis is not on the breadth of initial training. It is on three architectural properties:

Statefulness. The system maintains persistent internal state that is modified by experience. Unlike current inference-time processing, which leaves the model unchanged, a stateful system carries forward what it has encountered. Discoveries do not vanish when the current interaction ends.

Consequence coupling. The system remains connected to the outcomes of its own predictions and actions. When expectations are violated, those violations become signals for revision rather than inputs to be processed and discarded.

Governed revision. The system can distinguish between noise and structural change, and it updates itself accordingly. Cautiously under irreducible uncertainty, more substantially when accumulated evidence points to genuine environmental shift.

The resulting system starts small. It may begin with modest representational capacity. But because it learns continuously from real interaction, its understanding grows over time in a way that remains aligned with the actual environment rather than with a historical dataset.

The Direction of Learning

The deepest difference between these approaches can be stated simply.

World models learn about the world and then operate in it. The model is built from observations, and then the model is applied.

The approach described here learns from the world while operating in it. The model is built through interaction, and it continues to be revised by interaction.

This is not merely a difference in training schedule. It reflects a different assumption about the relationship between knowledge and environment.

The world model paradigm assumes that the structure of the world can be captured in advance with sufficient fidelity that the resulting model will remain useful across a wide range of future situations. The richness of the training data and the quality of the learned representations determine how far the model can generalize.

The approach developed here assumes something more modest: that no model, however rich, will remain adequate indefinitely in a world that continues to change. The system's value therefore depends not only on the quality of its initial understanding but on its capacity to detect when that understanding has become incomplete, and to revise accordingly.

From this perspective, a small model that can learn and accumulate understanding through real interaction may ultimately prove more durable than a large model that captured the world at a single point in time.

What a Small Stateful Model Needs to Do

To be clear, "small" does not mean impoverished. It means that the system's initial representational commitment is deliberately limited, because the architecture is designed to grow through experience rather than to arrive fully formed.

A small stateful model must be capable of several things.

First, it must form expectations about its environment. These expectations need not be as rich as those produced by a fully trained world model. They must be sufficient to generate predictions that can be compared against what actually happens.

Second, it must register discrepancies between expectation and outcome. This is the foundation of learning from consequence. Without this signal, the system has no basis for revision.

Third, it must accumulate structure over time. When a discrepancy is consistent, when the environment reliably deviates from expectation in a particular way, the system must be able to encode that regularity as part of its persistent internal state. This is what distinguishes accumulation from mere reaction.

Fourth, it must govern its own revision. Not every discrepancy warrants a structural change. The system must maintain a practical distinction between noise (which calls for calibrated restraint) and genuine structural change (which calls for internal revision). Without this distinction, the system either over-adapts to fluctuations or fails to respond to real shifts.

None of these capabilities require the system to start with a comprehensive model of the world. They require it to start with the machinery for building one.

Why Not Both?

A natural response is to suggest combining the approaches: begin with a rich pretrained world model and then add continual learning on top.

This is a reasonable engineering strategy, and the preceding essays have explicitly acknowledged that rich anticipatory structure may be a prerequisite for effective discovery. A model that already understands a great deal about the world is better positioned to detect when something has changed.

But the combination is not as simple as it might appear. There is a tension between the two strategies that deserves attention.

A system trained to build a comprehensive model of the world before deployment develops internal structures optimized for the patterns present in its training data. Those structures may be highly effective for the situations they were designed to capture. But they may also be difficult to revise when those situations change, precisely because they are so deeply embedded and so extensively optimized.

This is related to the well-known problem of catastrophic forgetting in neural networks. Large models that have been extensively trained on one regime can lose previously learned capabilities when adapted to a new one. The richer the initial training, the more there is to forget.

A system designed from the beginning for continual revision faces a different set of tradeoffs. Its initial representations may be less comprehensive, but its internal architecture is built to accommodate change. The structures it forms are intended to be updated, not preserved.

The engineering challenge, then, is not simply to bolt continual learning onto a pretrained model. It is to design architectures in which the capacity for revision is present from the start. Architectures where learning and revision are not separate phases but aspects of the same ongoing process.

An Analogy from Development

Biological intelligence illustrates the distinction.

An infant does not arrive with a comprehensive model of the world. It arrives with a set of sensory systems, a set of reflexes, some innate biases, and, critically, a nervous system that modifies itself in response to experience.

The infant does not need to have previously observed every kind of object, every physical interaction, or every social situation it will encounter. It needs the capacity to form expectations, observe outcomes, and revise its understanding as experience accumulates.

Over months and years, the infant builds an increasingly sophisticated model of the world. But at no point is there a sharp division between "training" and "deployment." The system is always learning, always revising, always operating.

The world model paradigm is closer to the strategy of learning a comprehensive atlas before setting out on a journey. The atlas may be extraordinarily detailed. But it was printed at a particular time, and the terrain continues to change.

The approach described here is closer to the strategy of learning to navigate. The navigator begins with less information but with the ability to update continuously based on what is actually encountered.

A Point of Agreement with LeCun

One point deserves emphasis: LeCun's position and the position developed here converge strongly on a central claim.

Both reject the assumption that scaling language models will produce general intelligence. Both hold that intelligence requires grounding in something beyond text: in the structure of the physical world, in the relationship between prediction and consequence, in the capacity to plan and reason about outcomes.

LeCun has argued that current LLMs lack persistent memory, that they cannot learn from individual interactions, and that their reasoning is reactive rather than deliberate. These observations align closely with the "discovery without accumulation" problem identified earlier and with the consequence-coupling argument developed in a companion essay.

The disagreement is about remedy. LeCun's remedy is to build better models of the world through improved training architectures. The remedy proposed here is to build systems that can form and revise their models of the world through sustained, governed interaction with it.

These may ultimately prove complementary. But they represent genuinely different bets about where the bottleneck lies.

The Bet

The world model paradigm bets that the bottleneck is representation: build a rich enough internal model, and the system will generalize effectively.

The LLM paradigm bets that the bottleneck is scale: train on enough data, and compositional generalization will cover most situations.

The approach developed here bets that the bottleneck is coupling: the system must remain connected to the consequences of its own predictions, and it must be able to revise itself when those consequences reveal that its understanding has become inadequate.

Under this view, the most important property of an intelligent system is not the richness of its initial model or the scale of its training data. It is the capacity to detect that its current model is no longer sufficient, and to begin building a better one from the evidence at hand.

A small system with this capacity may, over time, develop an understanding that is more aligned with the world as it actually is than a large system whose understanding was fixed at the moment training concluded.

Whether this bet is correct remains to be demonstrated. But the conditions under which intelligence originally evolved suggest it is worth taking seriously. After all, no organism that survived did so by learning everything it needed to know before entering the world.

It entered the world first. And it learned what the world required.

Keep Reading

Blog 2026-03-08 · 6 min read

Stateful vs. Stateless Systems - Why the Distinction Matters for Artificial Intelligence

Understanding the distinction between stateful and stateless systems helps clarify an important architectural question in artificial intelligence.

Blog 2026-03-01 · 10 min read

Why Generalization Is Not the Same as Intelligence

Strong generalization can still fall short of true intelligence. We examine the critical difference between representational breadth and adaptive coupling to consequence over time.

Blog 2026-03-03 · 9 min read

What Continual Learning Actually Means - and What It Doesn’t

A first-principles clarification of continual learning as a runtime property: selective adaptation under consequence, distinct from retraining cycles, scale, or constant parameter drift.

Investment Opportunities

We are currently engaging with investors and strategic partners interested in long-term technological impact grounded in scientific discipline.

Elysium Intellect represents a fundamentally different approach to artificial intelligence, prioritising continuous adaptation, reduced compute dependence, and real industrial application.

Conversations focus on collaboration, evidence building, and shared ambition.

Start a conversation