Emotalk: Speech-driven emotional disentanglement for 3d face animation
Speech-driven 3D face animation aims to generate realistic facial expressions that match
the speech content and emotion. However, existing methods often neglect emotional facial …
the speech content and emotion. However, existing methods often neglect emotional facial …
Learning audio-visual speech representation by masked multimodal cluster prediction
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …
strong signal for speech representation learning from the speaker's lip movements and the …
Faceformer: Speech-driven 3d facial animation with transformers
Speech-driven 3D facial animation is challenging due to the complex geometry of human
faces and the limited availability of 3D audio-visual data. Prior works typically focus on …
faces and the limited availability of 3D audio-visual data. Prior works typically focus on …
Diffsheg: A diffusion-based approach for real-time speech-driven holistic 3d expression and gesture generation
Abstract We propose DiffSHEG a Diffusion-based approach for Speech-driven Holistic 3D
Expression and Gesture generation. While previous works focused on co-speech gesture or …
Expression and Gesture generation. While previous works focused on co-speech gesture or …
Lipsync3d: Data-efficient learning of personalized 3d talking faces from video using pose and lighting normalization
In this paper, we present a video-based learning framework for animating personalized 3D
talking faces from audio. We introduce two training-time data normalizations that significantly …
talking faces from audio. We introduce two training-time data normalizations that significantly …
Language-guided music recommendation for video via prompt analogies
We propose a method to recommend music for an input video while allowing a user to guide
music selection with free-form natural language. A key challenge of this problem setting is …
music selection with free-form natural language. A key challenge of this problem setting is …
[HTML][HTML] Audio-Driven Facial Animation with Deep Learning: A Survey
Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial
expressions and lip movements synchronized with a given audio input. This survey provides …
expressions and lip movements synchronized with a given audio input. This survey provides …
Learnable irrelevant modality dropout for multimodal action recognition on modality-specific annotated videos
S Alfasly, J Lu, C Xu, Y Zou - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
With the assumption that a video dataset is multimodality annotated in which auditory and
visual modalities both are labeled or class-relevant, current multimodal methods apply …
visual modalities both are labeled or class-relevant, current multimodal methods apply …
Missing modality robustness in semi-supervised multi-modal semantic segmentation
Using multiple spatial modalities has been proven helpful in improving semantic
segmentation performance. However, there are several real-world challenges that have yet …
segmentation performance. However, there are several real-world challenges that have yet …
Laughtalk: Expressive 3d talking head generation with laughter
Laughter is a unique expression, essential to affirmative social interactions of humans.
Although current 3D talking head generation methods produce convincing verbal …
Although current 3D talking head generation methods produce convincing verbal …