The Road Ahead for AI: A Shift Towards Autonomous Machine Intelligence

Yann LeCun, Chief AI Scientist at Meta and Professor at NYU, presented a thought-provoking vision for the future of artificial intelligence at the AI Action Summit 2025. LeCun, a recipient of the prestigious ACM Turing Award, argued for a significant departure from current dominant AI paradigms, particularly large language models (LLMs), to achieve human-level intelligence. He emphasized the need for “human-level AI” not just as a scientific pursuit but as a product necessity, envisioning a future where smart devices will host ubiquitous AI assistants requiring sophisticated intelligence for seamless human interaction.

The Limitations of Current AI: Why Machine Learning “Sucks”

LeCun was frank about the shortcomings of present-day machine learning, stating, “machine learning sucks compared to what we observe in humans and animals”. The core issue, he explained, lies in the fundamental differences in learning abilities, common sense, and understanding of the physical world between machines and biological intelligence. Humans and animals possess extensive background knowledge, enabling rapid learning of new tasks, understanding of world mechanics, and the ability to reason and plan based on “common sense”.

Current AI systems, particularly LLMs, operate by auto-regressively producing tokens. While efficient for generating text, this method suffers from exponential divergence, leading to “hallucination issues” where generated tokens deviate from reasonable answers without a mechanism for correction.

LeCun highlighted a stark paradox: while AI can pass the bar exam, solve math problems, and prove theorems, it cannot yet replicate the seemingly simple feats of a cat or a 10-year-old child. A house cat can understand the physical world and plan complex actions, and a 10-year-old can clear a dinner table “zero shot” without prior training. This “Moravec’s Paradox” suggests that tasks humans find easy (like perception and mobility) are difficult for AI, while tasks humans find hard (like complex calculations or language generation) are relatively easy for AI.

The data discrepancy further underscores this point. A typical LLM is trained on roughly 30 trillion tokens, equivalent to all publicly available text on the internet, which would take a human almost half a million years to read. In contrast, a four-year-old child processes a comparable volume of visual data (approximately 1014 bytes) in just 16,000 awake hours. This immense difference in data efficiency strongly suggests that human-level intelligence will not be achieved solely through text-based training.

Human infants, in their first few months of life, acquire a vast amount of background knowledge about the world, including concepts like object permanence, solidity, rigidity, and intuitive physics (gravity, inertia). This learning occurs primarily through observation and interaction, with a surprisingly small amount of interaction required.

Towards Advanced Machine Intelligence (AMI): A New Paradigm

To overcome these limitations, LeCun proposed a shift towards what Meta calls “Advanced Machine Intelligence” (AMI), preferring this term over “Artificial General Intelligence” (AGI) due to the specialized nature of human intelligence. AMI systems, he argued, must possess several key capabilities:

  • Learning World Models: The ability to build mental models of how the world works from sensory input, which can be manipulated mentally.
  • Persistent Memory: Systems capable of retaining information over time.
  • Hierarchical Planning: The capacity to plan actions at multiple levels of abstraction, from high-level goals to low-level muscle control.
  • Reasoning: The ability to logically deduce and infer.
  • Controllable and Safe by Design: AI systems that inherently prioritize safety and can’t be “jailbroken”.

LeCun advocated for a change in the type of inference performed by AI systems, moving from the fixed-layer processing of current LLMs (which he likens to “system one” thinking—fast, intuitive, subconscious) to an “energy-based model” approach. This “system two” thinking allows the AI to spend more computational effort on complex problems, similar to how humans “think about something before doing it”. In this model, an “energy function” measures the compatibility between observation and proposed output, with the inference process aiming to minimize this incompatibility.

Joint Embedding Predictive Architectures (JEPA) and World Models

A core component of LeCun’s proposed architecture is the “World Model”. This model, given a current estimate of the world state and an imagined action sequence, predicts the resulting state of the world. This allows for “planning by optimization,” where the system searches for an action sequence that minimizes a task objective and satisfies “guardrail objectives” (constraints ensuring safe behavior). These guardrails would be explicitly implemented and hardwired, making the system inherently safer.

LeCun introduced the “Joint Embedding Predictive Architecture” (JEPA) as a crucial innovation for building such world models. Unlike generative architectures that attempt to predict exact pixels or tokens, JEPAs predict abstract representations of what’s happening. This is vital because the world is inherently non-deterministic, and exact prediction is often impossible. By learning representations that eliminate unpredictable details, the prediction problem becomes significantly simpler.

Training JEPAs involves minimizing a cost function that measures the divergence between the representation of the actual next state and the predicted representation. To prevent the encoder from “collapsing” (ignoring input and producing constant output), regularization methods are employed. LeCun highlighted methods like VICReg (Variance-Invariance-Covariance Regularization) and distillation-based approaches such as DINO and i-JEPA, which have shown promising results in learning generic, high-quality features from images and videos.

A particularly interesting finding with these systems is their ability to detect “strange” occurrences in videos, indicating a nascent level of “common sense”. By observing the prediction error, the system can identify events where objects spontaneously disappear or change shape, signaling a deviation from its learned understanding of intuitive physics.

Recent work on “DINO World Model” demonstrates how DINO features can be combined with an action-conditioned predictor to create a functional world model for planning. This allows a robot to imagine actions and predict future states, enabling it to plan sequences of movements to achieve a target state by minimizing the distance in the representation space. LeCun showed a video of a robot successfully manipulating blue chips in a complex environment, having learned the dynamics solely through observing state-action-next state sequences. This approach is reminiscent of “Model Predictive Control,” a classical concept in optimal control.

Recommendations and The Future of AI Platforms

LeCun concluded with strong recommendations for the AI research community:

  • Abandon generative models in favor of JEPAs.
  • Utilize energy-based models over probabilistic models.
  • Favor regularized methods over contrastive methods for training.
  • Reduce reliance on reinforcement learning, as it is often inefficient.
  • Academics should avoid LLM research due to intense competition with well-resourced industry labs. Instead, focus on unsolved problems like efficient planning algorithms, JEPAs with latent variables, planning under uncertainty, hierarchical planning, and learning cost modules.

Looking to the future, LeCun envisions “universal virtual assistants” mediating all human interactions with the digital world. He stressed that such foundational models must be open-source and widely available to prevent a handful of companies from monopolizing this critical technology. Training these models is expensive, but fine-tuning them for specific applications can be relatively cheap, making open-source platforms crucial for broader access and innovation. These platforms must be inclusive, understanding all world languages, cultures, and value systems, necessitating collaborative or distributed training efforts.

LeCun warned against the danger of geopolitical rivalry leading governments to restrict open-source AI, arguing that secrecy in research inevitably leads to falling behind. He believes that open-source models are already “slowly but surely” overtaking proprietary models. His presentation painted a clear picture of the scientific and societal imperative to evolve AI beyond its current form, moving towards truly intelligent and autonomous machines that can understand and interact with the world in a human-like way.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *