Emotalk: Speech-driven emotional disentanglement for 3d face animation

Z Peng, H Wu, Z Song, H Xu, X Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D face animation aims to generate realistic facial expressions that match
the speech content and emotion. However, existing methods often neglect emotional facial …

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arxiv preprint arxiv:2201.02184, 2022 - arxiv.org
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …

Faceformer: Speech-driven 3d facial animation with transformers

Y Fan, Z Lin, J Saito, W Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Speech-driven 3D facial animation is challenging due to the complex geometry of human
faces and the limited availability of 3D audio-visual data. Prior works typically focus on …

Diffsheg: A diffusion-based approach for real-time speech-driven holistic 3d expression and gesture generation

J Chen, Y Liu, J Wang, A Zeng, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We propose DiffSHEG a Diffusion-based approach for Speech-driven Holistic 3D
Expression and Gesture generation. While previous works focused on co-speech gesture or …

Lipsync3d: Data-efficient learning of personalized 3d talking faces from video using pose and lighting normalization

A Lahiri, V Kwatra, C Frueh, J Lewis… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we present a video-based learning framework for animating personalized 3D
talking faces from audio. We introduce two training-time data normalizations that significantly …

Language-guided music recommendation for video via prompt analogies

D McKee, J Salamon, J Sivic… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a method to recommend music for an input video while allowing a user to guide
music selection with free-form natural language. A key challenge of this problem setting is …

[HTML][HTML] Audio-Driven Facial Animation with Deep Learning: A Survey

D Jiang, J Chang, L You, S Bian, R Kosk, G Maguire - Information, 2024 - mdpi.com
Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial
expressions and lip movements synchronized with a given audio input. This survey provides …

Learnable irrelevant modality dropout for multimodal action recognition on modality-specific annotated videos

S Alfasly, J Lu, C Xu, Y Zou - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
With the assumption that a video dataset is multimodality annotated in which auditory and
visual modalities both are labeled or class-relevant, current multimodal methods apply …

Missing modality robustness in semi-supervised multi-modal semantic segmentation

H Maheshwari, YC Liu, Z Kira - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Using multiple spatial modalities has been proven helpful in improving semantic
segmentation performance. However, there are several real-world challenges that have yet …

Laughtalk: Expressive 3d talking head generation with laughter

K Sung-Bin, L Hyun, DH Hong… - Proceedings of the …, 2024 - openaccess.thecvf.com
Laughter is a unique expression, essential to affirmative social interactions of humans.
Although current 3D talking head generation methods produce convincing verbal …