AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Abstract

Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.

Learning Framework

We introduce the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel framework crafted to generate diverse high-fidelity motion sequences without the need for prolonged reverse diffusion. Diffusion-based transition models naturally produce diverse multi-modal motion would be too slow for interactive applications. To overcome this challenge, our AAMDM framework mainly adopts two synergistic modules: a Generation Module, for rapid initial motion drafting using Denoising Diffusion GANs; and a Polishing Module, for quality improvements using an Auto-regressive Diffusion Model with just two additional denoising steps. Another distinctive feature of AAMDM is its operation in a learned lower-dimensional latent space rather than the traditional full pose space, further accelerating the training process.

BibTeX

@inproceedings{li2024aamdm,
      title={AAMDM: Accelerated Auto-regressive Motion Diffusion Model},
      author={Li, Tianyu and Qiao, Calvin and Ren, Guanqiao and Yin, KangKang and Ha, Sehoon},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={1813--1823},
      year={2024}
    }