Fig. 1 - Teaser: StreamForce

A user-applied wind force directed toward the right (->) gradually increases in magnitude to trigger a smoke alarm. StreamForce responds online as forces change, unlike bidirectional baselines that require the full input sequence upfront.

Smoke alarm scenario - head-to-head

StreamForce (Ours) - causal, 16.6 FPS, 0.6 s latency
Force-Prompting (bidirectional)
Kling Motion Brush

Additional teaser examples (StreamForce)

Diverse scenes showcasing global (e.g. wind, smoke) and local (e.g. drawer, glass, boxes) force control on a single image input.

Girl with pearl earring
Pearl
Smoke
Drawer (local force)
Glass (local force)
Boxes (local force)

Fig. 3 - Visual Comparison vs. Baselines

Four-way comparison against Wan2.2 5B TI2V (text-only), Force-Prompting (bidirectional), and Kling Motion Brush, across both force preservation and force change settings, for both global and local forces.

Local Force - Preservation

Wan2.2 5B TI2V
Force-Prompting
Kling Motion Brush
StreamForce (Ours)

Global Force - Preservation

Wan2.2 5B TI2V
Force-Prompting
Kling Motion Brush
StreamForce (Ours)

Local Force - Change (time-varying force)

Wan2.2 5B TI2V
Force-Prompting
Kling Motion Brush
StreamForce (Ours)

Global Force - Change (time-varying force)

Wan2.2 5B TI2V
Force-Prompting
Kling Motion Brush
StreamForce (Ours)

Physics-IQ Benchmark - Local Force (with ground truth)

Force preservation cases recorded under controlled conditions; ground-truth videos serve as reference.

Wan2.2 5B TI2V
Force-Prompting
StreamForce (Ours)
Ground Truth

Physics-IQ Benchmark - Global Force (with ground truth)

Wan2.2 5B TI2V
Force-Prompting
StreamForce (Ours)
Ground Truth

Fig. 4 - T-Pushing Manipulation

Applying two local forces simultaneously to different parts of a T-shaped object produces coordinated translation and rotation that drive the object toward a target position - demonstrating multi-force, part-level interaction.

T-Pushing with two local forces

Fig. 5 - Object Falling and Bouncing

A force pushes the object across a table; once it passes the edge, it falls under gravity and rebounds on the ground with a plausible loss of energy - emerging from the spatiotemporal priors of the pretrained video model. (The bouncing is more clearly visible in the videos below than in the static paper figure.)

Glass pushed off a table - falls
Object pushed off an edge - falls and bounces

Fig. 6 - Ablation: Unified Force Representation vs. Separate / Force-Prompting

Comparing four configurations: F-Prompt Rep. (Force-Prompting representation), Separate (independent models for global vs. local force), Ours (teacher) (unified bidirectional teacher), and Ours (causal autoregressive student).

Local Force

F-Prompt Rep.
Separate Model
Ours (teacher, unified)
Ours

Global Force

F-Prompt Rep.
Separate Model
Ours (teacher, unified)
Ours

Fig. 7 - Ablation: Diverse Image-Force Data & Force-Changing Data

Removing diverse image-force data during distillation (w/o Diverse) reduces motion variety and adaptability. Removing force-changing training data (w/o Change) causes the model to largely ignore mid-sequence force updates.

w/o Diverse Data - Local

w/o Diverse
Ours (with diverse data)

w/o Diverse Data - Global

w/o Diverse
Ours (with diverse data)

w/o Force-Changing Data - Local

w/o Change
Ours (with change data)

w/o Force-Changing Data - Global

w/o Change
Ours (with change data)

Fig. 8 - Mass-aware Motion Behavior

Under the same horizontal force, the glass containing milk moves more slowly than the empty glass, reflecting the expected relationship between object mass and acceleration. This behavior emerges from the model's physical priors - not from any explicit mass conditioning.

Empty glass - moves faster under same force
Glass with milk - moves slower (heavier mass)

Fig. 9 - Friction-aware Motion Behavior

The same horizontal force applied to the same T-shaped object on two surfaces with different friction: the object travels a noticeably shorter distance on the higher-friction surface, reflecting friction opposing motion and dissipating kinetic energy.

Beach (lower friction) - travels farther
Rocks (higher friction) - travels shorter

Fig. A2 - Magnitude Response Comparisons (Supplementary)

Comparing motion under weaker versus stronger force magnitudes between Force-Prompting and StreamForce. Ours responds more clearly to force-magnitude differences than the baseline.

Larger force

Force-Prompting
Ours

Smaller force

Force-Prompting
Ours

Project Demo

End-to-end framework overview and live demonstrations.

StreamForce demo reel

Local Force Preservation - Additional Examples

Localized force applied to specific image regions, with the force held constant throughout the sequence. 242 examples (eight cases moved to the Failure Cases tab).

Local Force Change - Additional Examples

Localized force whose magnitude and/or direction varies over time within a single generated sequence. 246 examples (three cases moved to the Failure Cases tab).

Global Force Preservation - Additional Examples

Global forces (e.g. wind) applied uniformly across the scene, held constant throughout the sequence. 135 examples (one case moved to the Failure Cases tab).

Global Force Change - Additional Examples

Global forces whose magnitude and/or direction varies over time during generation. 151 examples.

Failure Cases

Representative cases where StreamForce does not respond correctly to the applied force. Common failure modes include: (1) partial-object detachment - StreamForce moves part of an object that should remain attached to the main object; (2) force leakage to nearby small objects - small objects close to the force-application point are inadvertently affected by the nearby force; (3) implausible motion responses on certain object/scene configurations.

Local Preserve - #26
Local Preserve - #71
Local Preserve - #84
Local Preserve - #112
Local Preserve - #145
Local Preserve - #146
Local Preserve - #147
Local Preserve - #210
Local Change - #104
Local Change - #181
Local Change - #211
Global Preserve - #90