SMooDi: Stylized Motion Diffusion Model

1Northeastern University, 2Stability AI, 3Google Research
ECCV 2024

SMooDi can generate stylized motion given a content text and a style motion sequence.

Style labels are not used as model input and shown here for visualization purpose.

Abstract

We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-motion model for stylization. Specifically, we propose style guidance to ensure that the generated motion closely matches the reference style, alongside a lightweight style adaptor that directs the motion towards the desired style while ensuring realism. Experiments across various applications demonstrate that our proposed framework outperforms existing methods in stylized motion generation.

Method

Our model generates stylized human motions from content text and a style motion sequence. At the denoising step, our model takes the content text, style motion, and noisy latent as input and predicts the noise, which is then transferred to the predicted noisy latent for the next step. This denoising step is repeated T times to obtain the noise-free motion latent, which is fed into a motion decoder to produce the final stylized motion.


Comparing to other methods on Styliezed Text2Motion


Our method is compared against baselines that generate stylized motion from content text and style sequences. These baselines use motion style transfer methods, including Motion Puzzle [Jang et al. (2022)] and Aberman et al. (2020), applied to motion generated by MLD [Chen et al. (2023)].

A person walks backward.

Style Motion

MLD

MLD + Motion Puzzle

MLD + [Aberman et al.]

Ours

A person bows.

Style Motion

MLD

MLD + Motion Puzzle

MLD + [Aberman et al.]

Ours

A person walks forward and then sit down.

Style Motion

MLD

MLD + Motion Puzzle

MLD + [Aberman et al.]

Ours

A person crawls on the floor.

Style Motion

MLD

MLD + Motion Puzzle

MLD + [Aberman et al.]

Ours


Comparing to other methods on Motion Style Transfer


SMooDi can support motion style transfer by using DDIM reversion to obtain the noised latent code for the content motion sequence. Here, we adapt Motion Puzzle [Jang et al. (2022)] and [Aberman et al. (2020)] for comparison.

.

Style Motion

Content Motion

Motion Puzzle

[Aberman et al.]

Ours

Style Motion

Content Motion

Motion Puzzle

[Aberman et al.]

Ours



Visual Results of Each Guidance




Ablation Studies


Concurrent Stylized Motion Works

  • (ArXiv 2024) On-The-Fly Learning To Transfer Motion Style With Diffusion Models: A Semantic Guidance Approach
  • (ArXiv 2024) Generative Motion Stylization within Canonical Motion Space
  • (ICLR 2024) Generative Human Motion Stylization in Latent Space
  • (CVPR 2024) MoST: Motion Style Transformer between Diverse Action Contents
  • (CVPR 2024) Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
  • BibTeX

    @article{zhong2024smoodi,
          title={SMooDi: Stylized Motion Diffusion Model},
          author={Zhong, Lei and Xie, Yiming and Jampani, Varun and Sun, Deqing and Jiang, Huaizu},
          journal={arXiv preprint arXiv:2407.12783},
          year={2024}
    }