StreamForce: Streaming Video Generation with Streaming Force Control

Global Force

Local Force

Arrows visualize the force input: their direction indicates the applied force direction, and changes in length indicate changes in force magnitude over time.

Open Full Gallery

Real-Time Interaction Demo

The demo shows how StreamForce consumes force inputs during generation, enabling users to steer evolving videos instead of specifying the entire motion sequence in advance.

Abstract

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video models that train separate models for different force types, assume fixed forces, or rely on non-causal processing, StreamForce is a causal and unified model that responds instantly and coherently to both local and global, time-varying forces. To achieve this, we design a unified force representation as a control signal and develop a distillation pipeline for force-controllable video generation. Our model combines autoregressive efficiency with force responsiveness, sustaining stable photometric and dynamic realism. StreamForce runs at up to 16.6 FPS on a single GPU, achieving state-of-the-art performance in both force adherence and motion realism.

Physical Behaviors

StreamForce inherits physical priors that support falling and bouncing, and different responses under varied mass or friction.

Falling and Bouncing

Falling glass

Falling and bouncing object

A force pushes the object across a table; once it passes the edge, it falls under gravity and rebounds on the ground with a plausible loss of energy, emerging from the spatiotemporal priors of the pretrained video model.

Mass-Aware Motion

Empty glass: faster motion

Glass with milk: slower motion

Under the same horizontal force, the glass containing milk moves more slowly than the empty glass, reflecting the expected relationship between object mass and acceleration. This behavior emerges from the model's physical priors, not from explicit mass conditioning.

Friction-Aware Motion

Lower-friction surface: travels farther

Higher-friction surface: travels shorter

StreamForce (Ours): causal streaming force control

This section highlights the core streaming setting: forces arrive while the video is being generated, and users can modify them at any time to steer the future rollout. StreamForce is causal, so it reacts online to changing force inputs; bidirectional baselines require the full force sequence upfront and cannot naturally support the same real-time interaction.

In the smoke-alarm example, a user-applied wind force directed toward the right gradually increases in magnitude. StreamForce updates the generated dynamics as the force changes.

Multi-Force and Part-Level Interaction

Applying two local forces simultaneously to different parts of a T-shaped object produces coordinated translation and rotation that drive the object toward a target position, demonstrating multi-force, part-level interaction.

BibTeX


          @misc{wang2026streamingvideogenerationstreaming,
            title={Streaming Video Generation with Streaming Force Control}, 
            author={Hanhui Wang and Yiming Xie and Haiwen Feng and Zhaoyang Lv and Shenlong Wang and Huaizu Jiang},
            year={2026},
            eprint={2606.07508},
            archivePrefix={arXiv},
            primaryClass={cs.CV},
            url={https://arxiv.org/abs/2606.07508}, 
          }