Expressive portrait video model

EMO AI video model

Start from a moving scene, then watch the category push toward control, identity, and continuity.

Alibaba Group, Institute for Intelligent ComputingResearch project and Alibaba Cloud Model Studio APIResearch project page, GitHub repository, arXiv paper, and Alibaba Cloud Model Studio API documentation.
Generated media

What this lets people do

Start from a moving scene, then watch the category push toward control, identity, and continuity.

Audio-driven humans, facial expression, head motion, identity persistence, and long-duration portrait animation.

Scene explainer

Three frames before the source list.

The page starts with the experience, then moves toward source-backed details.

01

First impression

A visible world

EMO, short for Emote Portrait Alive, generates expressive portrait videos from a single reference image and vocal audio.

02

Capability

Why it stands out

Makes the future of controllable video immediately understandable through a strong visual demo.

03

Boundary

What not to overclaim

EMO is not a complete world model; it focuses on portrait animation rather than explorable environments.

Good reasons to open this page

  • Visitors who want the fastest visual handle on this model lane.
  • Creators comparing whether the output feels like a clip, a place, or a controllable world.
  • Readers who need status and sources after the first impression.

Strengths

  • Makes the future of controllable video immediately understandable through a strong visual demo.
  • Shows why identity, expression, audio alignment, and duration matter for the path from clips to stateful worlds.
  • Useful as a homepage signal for the human side of video world modeling.

Limits and source boundary

  • EMO is not a complete world model; it focuses on portrait animation rather than explorable environments.
  • It should be compared as a video-human control signal, not as a replacement for Genie 3, Marble, or Cosmos.

Sources

FAQ

Dossier FAQ

Use these notes to keep model comments grounded in official sources and careful category boundaries.

Definition

What does World Models Watch count as a world model?

The site tracks systems that model environments, actions, spatial structure, or persistent simulated state. Pure text chatbots and ordinary video generators are only included when they provide a clear bridge toward interactive or physical world modeling.

Category boundary

Why do some AI video systems appear on a world-model site?

Video models are included only when they help explain the path from generated clips to controllable spaces, physics-aware prediction, or agent-ready simulation. The site keeps that distinction explicit so video generation is not overstated as a finished world simulator.

Editorial policy

How does the site decide whether a release is reliable enough to list?

Primary sources carry the most weight: official product pages, research posts, papers, documentation, code repositories, and company announcements. Secondary media can be referenced, but it stays labeled as reported or adjacent unless independently confirmed.

Community

What should readers post in comments?

Useful comments add source links, corrections, release-status notes, comparison questions, or concrete reader context. Comments are public immediately, so readers should avoid private information and unsupported promotional claims.

Read the full FAQ

Discussion

Reader discussion

Add source-backed corrections, questions, or notes for this page.

0 comments
Comments are ready in the codebase. Configure NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY, SUPABASE_SECRET_KEY to enable Supabase-backed discussion in production.

No comments yet. Start with a source note or a question for future coverage.