A comprehensive review of data‐driven co‐speech gesture generation

S Nyatsanga, T Kucherenko, C Ahuja… - Computer Graphics …, 2023 - Wiley Online Library
Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …

Gesturediffuclip: Gesture diffusion model with clip latents

T Ao, Z Zhang, L Liu - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org
The automatic generation of stylized co-speech gestures has recently received increasing
attention. Previous systems typically allow style control via predefined text labels or example …

Listen, denoise, action! audio-driven motion synthesis with diffusion models

S Alexanderson, R Nagy, J Beskow… - ACM Transactions on …, 2023 - dl.acm.org
Diffusion models have experienced a surge of interest as highly expressive yet efficiently
trainable probabilistic models. We show that these models are an excellent fit for …

Diffusestylegesture: Stylized audio-driven co-speech gesture generation with diffusion models

S Yang, Z Wu, M Li, Z Zhang, L Hao, W Bao… - arxiv preprint arxiv …, 2023 - arxiv.org
The art of communication beyond speech there are gestures. The automatic co-speech
gesture generation draws much attention in computer animation. It is a challenging task due …

Qpgesture: Quantization-based and phase-guided motion matching for natural speech-driven gesture generation

S Yang, Z Wu, M Li, Z Zhang, L Hao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven gesture generation is highly challenging due to the random jitters of human
motion. In addition, there is an inherent asynchronous relationship between human speech …

Co-speech gesture video generation via motion-decoupled diffusion model

X He, Q Huang, Z Zhang, Z Lin, Z Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Co-speech gestures if presented in the lively form of videos can achieve superior visual
effects in human-machine interaction. While previous works mostly generate structural …

The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings

T Kucherenko, R Nagy, Y Yoon, J Woo… - Proceedings of the 25th …, 2023 - dl.acm.org
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-
driven gesture-generation systems using the same speech and motion dataset, followed by …

Emotional speech-driven 3d body animation via disentangled latent diffusion

K Chhatre, N Athanasiou, G Becherini… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing methods for synthesizing 3D human gestures from speech have shown promising
results but they do not explicitly model the impact of emotions on the generated gestures …

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

S Mehta, A Deichler, J O'regan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although humans engaged in face-to-face conversation simultaneously communicate both
verbally and non-verbally methods for joint and unified synthesis of speech audio and co …

A survey on deep multi-modal learning for body language recognition and generation

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arxiv preprint arxiv:2308.08849, 2023 - arxiv.org
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …