World model concept map

World model concept map from AI video to spatial worlds.

Use the world model concept map to connect AI video, spatial computing, digital twins, physical AI, and generated worlds.

Virtual worldsAI videoSpatial computingPhysical AI
Concept map

Core sentence

Virtual worlds were built by humans. Now AI is learning to generate, control, and simulate them.

This sentence is the spine of the site. Minecraft and Roblox explain the user mental model. The metaverse explains persistence and social space. Vision Pro explains spatial computing. EMO, Veo, Wan, Kling, and Ray explain controllable video. Cosmos and digital twins explain simulation. World models connect all of it into one technical future.

Scene explainer

A three-step visual map.

The map works best as a consumer story: start from what people already know, then reveal the new layer.

01

Known worlds

Games and metaverse taught the interface.

People already understand avatars, spaces, inventories, maps, and shared places.

02

Generated media

AI video made the world visible.

Synthetic scenes became easy to watch, remix, and share, but still behaved like clips.

03

World models

The next layer is enterable and controllable.

The scene starts to remember space, respond to action, and support agents or physical simulation.

Concept flow

How the world model concept map connects familiar ideas.

Definition page
01

Past interface

Human-built virtual worlds

Before world models, people already understood avatars, sandbox worlds, social rooms, and user-built spaces.

02

Current surface

AI-generated video and humans

AI video is the visible surface of the shift. The deeper issue is control, consistency, and memory across time.

03

Spatial interface

Spatial computing and immersive access

Vision Pro and spatial computing are not the same thing as world models. They are how generated worlds may be seen and operated.

04

Industrial layer

Simulation, digital twins, and physical AI

The industrial version of world models is not entertainment. It is simulation for robots, vehicles, factories, and cities.

05

Core capability

World models

The core shift is from generating isolated outputs to modeling how a world changes under time, viewpoint, and action.

Past interface

Human-built virtual worlds

Before world models, people already understood avatars, sandbox worlds, social rooms, and user-built spaces.

Minecraft-style identity

Blocky avatars

Simple characters make virtual presence easy to understand. The form is basic, but the mental model is powerful: a person can enter a world.

MinecraftRoblox avatarVoxel worlds
World models inherit the question of presence: who is inside the generated world, and can that identity persist?
Buildable spaces

Sandbox worlds

Minecraft and Roblox trained users to expect worlds that can be modified, extended, and shared.

Minecraft blocksRoblox experiencesUGC worlds
AI world generation becomes more valuable when generated spaces are editable instead of disposable.
Persistent social space

Metaverse

The metaverse idea framed virtual worlds as social, persistent, and identity-driven, even when the tooling was still manual.

Meta Horizon WorldsVR roomsSocial worlds
World models can supply the missing automation layer: worlds generated on demand, not only built by hand.

Current surface

AI-generated video and humans

AI video is the visible surface of the shift. The deeper issue is control, consistency, and memory across time.

Audio-driven identity

Expressive humans

EMO makes the control problem visible: the same identity needs to move, emote, sing, and stay coherent over time.

EMODigital humansTalking avatars
If a generated person cannot persist, a generated world cannot feel stable.
Prompt-to-motion

Video models

Veo, Wan, Kling, Ray, and earlier systems like Sora turn text, images, audio, and references into moving scenes.

Veo 3.1Wan2.7-VideoKlingRaySora
The next comparison is whether those scenes can be controlled, extended, and interacted with.
From avatar to actor

Digital characters

MetaHuman-style characters, Roblox avatars, and EMO-like portraits point to a future where generated characters need continuity.

MetaHumanRoblox avatarEMO portraitRunway Characters
Characters are the social layer of generated worlds.

Spatial interface

Spatial computing and immersive access

Vision Pro and spatial computing are not the same thing as world models. They are how generated worlds may be seen and operated.

Computer as environment

Spatial computing

Apple Vision Pro reframes computing as something placed into space instead of locked inside a flat screen.

Apple Vision ProSpatial video3D interfaces
World models need interfaces where generated space can be inspected, edited, and inhabited.
Scene as data

3D reconstruction

NeRF, Gaussian splatting, and scan-to-3D workflows make real or imagined spaces computable.

NeRF3D Gaussian SplattingLingBot-MapHY-World 2.0
Generated worlds need spatial structure, not only pixels.
From screen to place

Immersive worlds

VR, AR, and mixed reality make the user feel located inside a generated or captured environment.

Meta QuestVision ProImmersive video
World models become more legible when users can enter and manipulate the output.

Industrial layer

Simulation, digital twins, and physical AI

The industrial version of world models is not entertainment. It is simulation for robots, vehicles, factories, and cities.

Real world mirror

Digital twins

Digital twins model real places and systems so teams can test changes before touching the physical world.

NVIDIA OmniverseFactory twinsCity simulation
World models can make simulations cheaper to create and easier to vary.
AI for embodied systems

Physical AI

Robots and autonomous vehicles need models of how environments respond to motion, contact, and decisions.

CosmosHY-Embodied-0.5LingBot-VALingBot-VLA
This is where world models become training infrastructure, not just media.
World-scale spatial memory

Geospatial models

Large geospatial models connect AI to real-world places, maps, and location-aware behavior.

Niantic spatial AIMapsAR location layers
They turn the real world into a modelable environment.

Core capability

World models

The core shift is from generating isolated outputs to modeling how a world changes under time, viewpoint, and action.

World responds to action

Interactive generation

A world model should preserve a coherent state when the user moves, edits, or acts.

Genie 3MarbleHappyOysterHY-World 2.0
Interaction is the difference between watching a clip and entering a system.
Base models for simulation

World foundation models

Foundation models can become reusable infrastructure for generating, predicting, and testing world states.

CosmosGWM-1World APIHY-World 2.0LingBot-VALingBot-VLA
This is where creative, spatial, and physical world generation begin to share a vocabulary.
World as training ground

Agent environments

Agents need environments where they can observe, act, fail, and learn.

Game worldsRobot simulatorsInteractive scenes
World models can become the substrate for training and evaluating future AI agents.

Bridge table

What each familiar concept contributes.

Entry conceptKnown forConnects toMeaning inside world models
Blocky avatars / MinecraftSimple identity inside a buildable worldAvatars, sandbox worlds, UGCGenerated worlds need persistent users, objects, and editable structure.
MetaversePersistent social virtual spacesVR, Horizon Worlds, social identityWorld models automate world creation instead of relying only on manual building.
Vision ProSpatial computing and immersive interfaceAR, spatial video, 3D interactionGenerated worlds need a spatial interface for viewing, editing, and operation.
AI videoGenerated motion, characters, and scenesEMO, Veo 3.1, Wan2.7-Video, Kling, RayThe video layer must become controllable, continuous, and stateful.
Digital twinsSimulation of real systemsOmniverse, robotics, LingBot-VA, LingBot-VLA, city and factory modelsWorld models become useful when they predict and test real-world behavior.
World modelPredicting and generating world stateGenie 3, Marble, Cosmos, GWM-1The final category is not a place or device; it is the model that makes worlds behave.