Global Force
Local Force
Arrows visualize the force input: their direction indicates the applied force direction, and changes in length indicate changes in force magnitude over time.
Real-Time Interaction Demo
The demo shows how StreamForce consumes force inputs during generation, enabling users to steer evolving videos instead of specifying the entire motion sequence in advance.
Abstract
We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video models that train separate models for different force types, assume fixed forces, or rely on non-causal processing, StreamForce is a causal and unified model that responds instantly and coherently to both local and global, time-varying forces. To achieve this, we design a unified force representation as a control signal and develop a distillation pipeline for force-controllable video generation. Our model combines autoregressive efficiency with force responsiveness, sustaining stable photometric and dynamic realism. StreamForce runs at up to 16.6 FPS on a single GPU, achieving state-of-the-art performance in both force adherence and motion realism.
Physical Behaviors
StreamForce inherits physical priors that support falling and bouncing, and different responses under varied mass or friction.
Falling and Bouncing
A force pushes the object across a table; once it passes the edge, it falls under gravity and rebounds on the ground with a plausible loss of energy, emerging from the spatiotemporal priors of the pretrained video model.
Mass-Aware Motion
Under the same horizontal force, the glass containing milk moves more slowly than the empty glass, reflecting the expected relationship between object mass and acceleration. This behavior emerges from the model's physical priors, not from explicit mass conditioning.
Friction-Aware Motion
The same horizontal force is applied to the same T-shaped object on two surfaces with different friction: the object travels a noticeably shorter distance on the higher-friction surface, reflecting friction opposing motion and dissipating kinetic energy.
Baseline Comparisons
Four-way comparison against Wan2.2 5B TI2V (text-only), Force-Prompting (bidirectional), and Kling Motion Brush, across both force preservation and force change settings, for both global and local forces.
Local Force Preservation
Global Force Preservation
Local Force Change
Global Force Change
Multi-Force and Part-Level Interaction
Applying two local forces simultaneously to different parts of a T-shaped object produces coordinated translation and rotation that drive the object toward a target position, demonstrating multi-force, part-level interaction.
BibTeX
@misc{wang2026streamingvideogenerationstreaming,
title={Streaming Video Generation with Streaming Force Control},
author={Hanhui Wang and Yiming Xie and Haiwen Feng and Zhaoyang Lv and Shenlong Wang and Huaizu Jiang},
year={2026},
eprint={2606.07508},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.07508},
}