The Second Wave of AI Hits the Factory!
The news of the past few weeks has come in rapid succession. In early June, NVIDIA unveiled Cosmos 3, a physics-based world model designed to bring artificial intelligence directly into robotic environments. Almost simultaneously, Siemens presented the Intelligence Center X, a new industrial platform designed to bring AI agents and production workers together and transform industrial AI from isolated pilot projects into scalable applications.
12 Jun 2026Share
And just a few weeks earlier, Google had also unveiled a “world model” called Gemini Omni at its I/O developer conference, announcing investments of up to 190 billion euros for the project.
What unites these announcements is the direction—AI is meant to understand the physical world and act within it. Large language models can describe production. But anyone who actually wants to intervene in the factory needs an AI that understands the physical world. This is precisely where world models come in and are currently shifting the balance in industrial AI. Within a matter of weeks, a term that had long been of interest primarily to experts has thus moved to the center of the AI debate: the world model. Behind it lies a question on which billions in investments depend and which will determine the future of every automated factory.
The world consists of more than just text
It is currently being decided on the factory floor which type of artificial intelligence will truly succeed. The large language models of recent years—ChatGPT, Gemini, Claude—have impressed because they can talk about almost any topic. They draw their knowledge from vast amounts of text and return it in fluent language. In production, however, where a hundredth of a millimeter determines whether a part is scrap and a single misstep can paralyze a line, this capability reaches its limits. A model that knows the world only through text does not truly understand it. This is where world models come into play. And they are changing the rules of the game.
Eloquent, but blind to physics
A large language model can describe an assembly line as accurately as if it had stood right next to it for years. It recognizes error patterns, suggests improvements, and answers follow-up questions. But as soon as reality interferes—when a workpiece tips over, a gripper slips, or a machine starts up in a configuration no one anticipated—its confidence vanishes. The model lacks an internal picture of how things behave in space. It has read about physics without ever having experienced it.
“Very limited understanding of logic”
Yann LeCun, Turing Award winner and long-time chief AI scientist at Meta, is one of the sharpest critics of the LLM hype. Language models possess a “very limited understanding of logic,” he argues; they “do not understand the physical world, have no long-term memory, cannot think rationally, and cannot plan hierarchically.” In his view, this is not a matter of maturation time, but a limitation inherent in the design of these systems. Just how serious he is about this is illustrated by a personnel move that caused a stir in the industry: At the end of 2025, LeCun left Meta after twelve years and founded the startup AMI Labs in Paris, which focuses exclusively on world models. In March 2026, the company raised around one billion dollars—the largest seed funding round ever received by a European company.
Greg Brockman, co-founder of OpenAI, strongly disagrees, however: The path to general AI is in sight, he argues, and language models are leading the way. Any company deciding on its AI architecture today is, whether intentionally or not, aligning itself with one of these two camps.
A model with an internal representation of the world
World models are not simply larger language models, but a different approach. They do not primarily calculate which word is most likely to come next, but instead build an internal representation of their environment: objects that fall, forces that act, causes that lead to consequences. On this basis, they simulate what happens next even before it actually occurs. Instead of describing what happens, they play it out in advance.
The first reliable figures are available. In robotics, current studies show a performance leap of up to 30 percent when systems learn via such internal world model representations rather than directly from raw data. This is more than a cosmetic gain; it changes the very foundation on which such systems operate.
The difference can be illustrated by a simple process: If a component falls onto the conveyor belt, a language model describes what happened afterward. A world model anticipates the fall, detects the deviation early, and corrects the gripper before an error even occurs.
Research seeks a clear definition
For a long time, it remained unclear what exactly constitutes a world model—each research group interpreted the term differently. In April 2026, an international team proposed an initial authoritative framework with the open-source OpenWorldLib. According to this framework, a world model perceives its environment, interacts with it, and retains its states in memory. Text-to-video systems like Sora are explicitly excluded: they generate impressive images, but without feedback from the real world. Perhaps the most important observation of the work is that today’s language models already possess, in principle, the prerequisites to evolve in this direction. However, there is still a long way to go.
What the latest HANNOVER MESSE revealed
Just how quickly the field is shifting was evident at HANNOVER MESSE 2026. For the first time, “Physical AI” took center stage as a standalone theme: intelligence that operates not on screens, but within machines, systems, and robots. Three examples stood out in particular.
Agile Robots showcased Agile ONE, a humanoid robot that independently perceives its environment, makes autonomous decisions, and acts in real time in complex industrial situations without a fixed program. SEW-EURODRIVE introduced a configuration agent that allows machines and robots to be commissioned through dialogue. What’s remarkable about this: The system deliberately does without classic LLM architecture and positions itself as an independent, European alternative. Finally, Siemens demonstrated at a flexible shoe production facility what it’s all about at its core—an AI that not only makes recommendations but also intervenes on its own.
Foundation Models for so-called Cross-Embodiment Transfer
In parallel, an approach is maturing in research that could fundamentally simplify the integration of robots: foundation models for so-called cross-embodiment transfer. A single model, trained on data from a wide variety of robot types, can be applied to machines it has never seen before. Instead of starting from scratch for each machine, it transfers existing knowledge to new hardware.
Language models and world models are not competitors
The Humanoid Robot Study 2026 by Tobias Bock (Nexery) offers a sobering assessment: The technology is leaving the lab, and the first industrial applications are a reality. However, robust autonomy and scalable integration are still lacking for widespread deployment. At the same time, China is significantly picking up the pace, and Europe must fight to keep up. Three practical takeaways can be derived from this: First, the right architecture for the task at hand is crucial. Language models and world models are not competitors but complement each other: language is suitable for communicating with humans, while an internal world model is for intervening in physical processes. Confusing the two means using the wrong tool.
Second, the ability to plan ahead determines operational reliability. Whether a system assesses potential consequences or merely extrapolates from the past is not an academic detail. This determines whether a solution works only in a pilot project or can also withstand three-shift operation.
Third, data strategy is becoming a competitive factor. World models do not learn from text, but from physical states, sensor data, and real-world processes. Those who structure and make their production and sensor data usable today are laying the foundation upon which the next generation of AI can even be built.
Less a distant vision than a matter of preparation
Language models have demonstrated how well machines can handle language. The next stage is more challenging: machines that understand their environment and act within it. For industry, this is less a distant vision than a matter of preparation. And that begins with one’s own data and the selection of the appropriate technology.
Related Exhibitors
Related Speakers
Related Events
Interested in news about exhibitors, top offers and trends in the industry?
Browser Notice
Your web browser is outdated. Update your browser for more security, speed and optimal presentation of this page.
Update Browser