T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Kaiyue Sun1 Kaiyi Huang1 Xian Liu2 Yue Wu3 Zihan Xu1 Zhenguo Li3 Xihui Liu1

1 The University of Hong Kong 2 The Chinese University of Hong Kong 3 Huawei Noah's Ark Lab

T2V-CompBench Prompt Suite.

 

Overview:

Introduction

 

 

Evaluation Metrics


MLLM-based evaluation metrics for consistent and dynamic attribute binding, action binding and object interactions.
Detection-based evaluation metrics for spatial relationships and object interactions.
Tracking-based evaluation metrics for motion binding.

 

Evaluation Results

 

Benchmarking T2V Models with a radar chart.

T2V-CompBench evaluation results for 23 T2V generation models (17 open-source models and 6 commercial models).


 

Bibtex


    @article{sun2024t2v,
  	title={T2v-compbench: A comprehensive benchmark for compositional text-to-video generation},
  	author={Sun, Kaiyue and Huang, Kaiyi and Liu, Xian and Wu, Yue and Xu, Zihan and Li, Zhenguo and Liu, Xihui},
  	journal={arXiv preprint arXiv:2407.14505},
  	year={2024}
     }