SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

1UC San Diego, 2Northeastern University

Abstract

Human-centric video frame interpolation has great potential for improving people's entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available in the community, none of them is dedicated to human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution (≥720p) slow-motion sports videos crawled from YouTube. We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets. It highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To improve the accuracy, we introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection, respectively. The loss terms are model agnostic and can be easily plugged into any flow-based video frame interpolation approaches. Experimental results validate the effectiveness of our proposed loss terms, leading to strong baseline models on our benchmark.

SportsSloMo Benchmark

Although various benchmarks are available for video frame interpolation, none of them is dedicated to human-centric scenarios. To bridge this gap and to foster the research in this important direction, we create a new dataset, SportsSloMo, focusing on high-resolution (≥720p) slow-motion sports videos crawled from YouTube under the Common Creative Licence. In total, our benchmark has 130K video clips and more than 1M video frames. Compared with other existing datasets, our proposed SportsSloMo benchmark is the largest one so far, with high resolution and a focus on human-centric scenarios.

Flow Magnitude Distribution

As is shown by the histogram of flow magnitude, our proposed SportsSloMo dataset contains more large-displacement motion compared with widely-used VFI datasets.

Histogram Image

Sports Categories Distribution

Our proposed SportsSloMo dataset covers 22 various sports categories with different content and motion patterns, including hockey, baseball, skating, basketball, running, volleyball, etc.

Category Image

Overview Video

Data

Video sequences (8.5G): Link

Video frame interpolation split file: Link

BibTeX

@InProceedings{Chen_2023_sportsslomo,
      author    = {Chen, Jiaben and Jiang, Huaizu},
      title     = {SportsSloMo: A New Benchmark and Baseline Models for Human-centric Video Frame Interpolation},
      booktitle = {arXiv},
      year      = {2023}
  }