Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
TEMOS: Generating Diverse Human Motions from Textual Descriptions
We address the problem of generating diverse 3D human motions from textual descriptions.
This challenging task requires joint modeling of both modalities: understanding and …
This challenging task requires joint modeling of both modalities: understanding and …
Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation
Generating talking head videos through a face image and a piece of speech audio still
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
contains many challenges. ie, unnatural head movement, distorted expression, and identity …
Human-computer interaction system: A survey of talking-head generation
Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …
intelligent customer service, and online education, thanks to the rapid development of …
Emo: Emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …
head video generation by focusing on the dynamic and nuanced relationship between audio …
Generating holistic 3d human motion from speech
This work addresses the problem of generating 3D holistic body motions from human
speech. Given a speech recording, we synthesize sequences of 3D body poses, hand …
speech. Given a speech recording, we synthesize sequences of 3D body poses, hand …
Codetalker: Speech-driven 3d facial animation with discrete motion prior
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
Deep learning for visual speech analysis: A survey
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …
due to its wide applications, such as public security, medical treatment, military defense, and …
Emotalk: Speech-driven emotional disentanglement for 3d face animation
Speech-driven 3D face animation aims to generate realistic facial expressions that match
the speech content and emotion. However, existing methods often neglect emotional facial …
the speech content and emotion. However, existing methods often neglect emotional facial …
Seeing what you said: Talking face generation guided by a lip reading expert
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …
concerning lips given coherent speech input. The previous studies revealed the importance …