Multimodal Dance Generation Networks Based on Audio-Visual Analysis

View Sample PDF

Author(s): Lijuan Duan (Beijing University of Technology, China), Xiao Xu (Beijing University of Technology, China)and Qing En (Beijing University of Technology, China)
Copyright: 2021
Volume: 12
Issue: 1
Pages: 16
Source title: International Journal of Multimedia Data Engineering and Management (IJMDEM)
Editor(s)-in-Chief: Chengcui Zhang (University of Alabama at Birmingham, USA)and Shu-Ching Chen (University of Missouri-Kansas City, United States)
DOI: 10.4018/IJMDEM.2021010102

Keywords: Information Science Reference / Media & Communications / Multimedia Technology

Purchase

View Multimodal Dance Generation Networks Based on Audio-Visual Analysis on the publisher's website for pricing and purchasing information.

Abstract

3D human dance generation from music is an interesting and challenging task in which the aim is to estimate 3D pose from visual and audio information. Existing methods only use skeleton information to complete this task, which may cause jittering results. In addition, due to lack of appropriate evaluation metrics for this task, it is difficult to evaluate the quality of the generated results. In this paper, the authors explore multi-modality dance generation networks through constructing the correspondence between the visual and the audio cues. Specifically, they propose a 2D prediction module to predict future frames by fusing visual and audio features. Moreover, they propose a 3D conversion module, which is able to generate the 3D skeleton from the 2D skeleton. In addition, some new human dance generation evaluation metrics are proposed to evaluate the quality of the generated results. Experimental results indicate that the proposed modules can meet the requirements of authenticity and diversity.

The IRMA Community

Research IRM

Multimodal Dance Generation Networks Based on Audio-Visual Analysis

Purchase

Abstract

Related Content

IRMA Sponsors