EndlessWorld β€” Real-Time 3D-Aware Long Video Generation

Checkpoint for EndlessWorld, a streaming video diffusion model that produces unbounded-length, 3D-consistent videos in real time on a single GPU.

What's in this repo

File Description
model.pt DMD-distilled generator weights for the EndlessWorld causal Wan model (step 1000 of the self_forcing_dmd_separate SOTA run).

This is the generator checkpoint only. To run inference you also need:

  1. The Wan2.1-T2V-1.3B base weights (text encoder, VAE, etc.)
  2. The AnySplat 3D Gaussian feature encoder

See the GitHub README for the full setup.

Method

EndlessWorld extends the Self-Forcing causal diffusion framework (Wan2.1 T2V-1.3B backbone) with a Global 3D-Aware Attention module that injects scene geometry β€” extracted on the fly by AnySplat β€” into the conditional embedding of every autoregressive chunk.

EndlessWorld pipeline

Three ingredients:

  • Conditional autoregressive (self-forcing) training β€” frames are denoised block-by-block with KV-cache, conditioning each new block on previously generated content.
  • Global 3D-Aware Attention β€” CrossAttentionFusion + To3D modules ingest 3D Gaussian features produced by AnySplat and fuse them with the text embedding, giving the generator a persistent geometric memory of the world rendered so far.
  • Real-time streaming inference β€” the rollout loop re-extracts 3D features from the most recently decoded chunk and feeds the fused embedding back into the causal generator, enabling indefinite extension on a single GPU.

Quick start

git clone https://github.com/BWGZK-keke/EndlessWorld
cd EndlessWorld
pip install -r requirements.txt

# Download this checkpoint
huggingface-cli download BWGZK/EndlessWorld model.pt --local-dir checkpoints/

# Update configs/self_forcing_dmd.yaml -> generator_ckpt: checkpoints/model.pt
bash test.sh

Loading directly from Python:

import torch
from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(repo_id="BWGZK/EndlessWorld", filename="model.pt")
state_dict = torch.load(ckpt, map_location="cpu")

Training

Citation

@article{zhang2025endlessworld,
  title   = {Endless World: Real-Time 3D-Aware Long Video Generation},
  author  = {Zhang, Ke and others},
  journal = {arXiv preprint arXiv:2512.12430},
  year    = {2025}
}

License

Apache 2.0 β€” same as the upstream Wan2.1 and Self-Forcing projects.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BWGZK/EndlessWorld

Finetuned
(49)
this model

Paper for BWGZK/EndlessWorld