A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Deep learning for visual speech analysis: A survey
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …
due to its wide applications, such as public security, medical treatment, military defense, and …
Speech driven talking face generation from a single image and an emotion condition
Visual emotion expression plays an important role in audiovisual speech communication. In
this work, we propose a novel approach to rendering visual emotion expression in speech …
this work, we propose a novel approach to rendering visual emotion expression in speech …
[HTML][HTML] Talking human face generation: A survey
Talking human face generation aims at synthesizing a natural human face that talks in
correspondence to the given text or audio series. Implementing the recently developed …
correspondence to the given text or audio series. Implementing the recently developed …
[HTML][HTML] Speech driven video editing via an audio-conditioned diffusion model
Taking inspiration from recent developments in visual generative tasks using diffusion
models, we propose a method for end-to-end speech-driven video editing using a denoising …
models, we propose a method for end-to-end speech-driven video editing using a denoising …
Deep person generation: A survey from the perspective of face, pose, and cloth synthesis
Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shop**, and art/movie …
applications in virtual agents, video conferencing, online shop**, and art/movie …
Expression-tailored talking face generation with adaptive cross-modal weighting
The key of talking face generation is to synthesize the identity-preserving natural facial
expressions with accurate audio-lip synchronization. To accomplish this, it requires to …
expressions with accurate audio-lip synchronization. To accomplish this, it requires to …
Talking head generation with audio and speech related facial action units
The task of talking head generation is to synthesize a lip synchronized talking head video by
inputting an arbitrary face image and audio clips. Most existing methods ignore the local …
inputting an arbitrary face image and audio clips. Most existing methods ignore the local …
Speech2video: Cross-modal distillation for speech to video generation
S Si, J Wang, X Qu, N Cheng, W Wei, X Zhu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper investigates a novel task of talking face video generation solely from speeches.
The speech-to-video generation technique can spark interesting applications in …
The speech-to-video generation technique can spark interesting applications in …
Talking face generation via facial anatomy
S Liu, H Wang - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org
To generate the corresponding talking face from a speech audio and a face image, it is
essential to match the variations in the facial appearance with the speech audio in subtle …
essential to match the variations in the facial appearance with the speech audio in subtle …