01Stage 1
AI video made synthetic reality visible.
The first mainstream mental model was simple: type a prompt, receive a moving scene. That made synthetic environments easy to see, share, and evaluate.
EMO dossierEvolution path
AI video to world models explains how generated clips become action-conditioned worlds, persistent 3D spaces, and agent societies.
From clip to placeScene explainer
Each page reads as a visual path first, then keeps the source-backed links nearby.
01Stage 1
The first mainstream mental model was simple: type a prompt, receive a moving scene. That made synthetic environments easy to see, share, and evaluate.
EMO dossier
02Stage 2
Genie, Oasis, and related systems make user action part of the generated output. Movement, steering, and camera control become important signals, not just afterthoughts.
Genie 3 dossier
03Stage 3
Marble, HY-World 2.0, and other 3D world systems shift attention toward editable geometry, exported assets, larger spaces, and reusable world state.
Marble dossierStage 1
The first mainstream mental model was simple: type a prompt, receive a moving scene. That made synthetic environments easy to see, share, and evaluate.
The limit is that the viewer remains outside the frame. The clip can be impressive while still lacking controllable space, persistent state, or action feedback.
Stage 2
Genie, Oasis, and related systems make user action part of the generated output. Movement, steering, and camera control become important signals, not just afterthoughts.
This is the bridge from AI video toward world models. The user is no longer only watching a generated scene. The user is testing whether the scene behaves like a place.
Stage 3
Marble, HY-World 2.0, and other 3D world systems shift attention toward editable geometry, exported assets, larger spaces, and reusable world state.
Once a world can be returned to, edited, exported, or populated with agents, it becomes a platform surface rather than a one-time generated asset.