Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Pose-controllable talking face generation by implicitly modularized audio-visual representation

H Zhou, Y Sun, W Wu, CC Loy… - Proceedings of the …, 2021 - openaccess.thecvf.com
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …

Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video

E Tretschk, A Tewari, V Golyanik… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Non-Rigid Neural Radiance Fields (NR-NeRF), a reconstruction and
novel view synthesis approach for general non-rigid dynamic scenes. Our approach takes …

Ad-nerf: Audio driven neural radiance fields for talking head synthesis

Y Guo, K Chen, S Liang, YJ Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generating high-fidelity talking head video by fitting with the input audio sequence is a
challenging problem that receives considerable attentions recently. In this paper, we …

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …

Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan

F Yin, Y Zhang, X Cun, M Cao, Y Fan, X Wang… - European conference on …, 2022 - Springer
One-shot talking face generation aims at synthesizing a high-quality talking face video from
an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a …

Towards fast, accurate and stable 3d dense face alignment

J Guo, X Zhu, Y Yang, F Yang, Z Lei, SZ Li - European Conference on …, 2020 - Springer
Existing methods of 3D dthus limiting the scope of their practical applications. In this paper,
we propose a novel regression framework which makes a balance among speed, accuracy …

Eamm: One-shot emotional talking face via audio-based emotion-aware motion model

X Ji, H Zhou, K Wang, Q Wu, W Wu, F Xu… - ACM SIGGRAPH 2022 …, 2022 - dl.acm.org
Although significant progress has been made to audio-driven talking face generation,
existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In …

Fsgan: Subject agnostic face swap** and reenactment

Y Nirkin, Y Keller, T Hassner - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Abstract We present Face Swap** GAN (FSGAN) for face swap** and reenactment.
Unlike previous work, FSGAN is subject agnostic and can be applied to pairs of faces …

Live speech portraits: real-time photorealistic talking-head animation

Y Lu, J Chai, X Cao - ACM Transactions on Graphics (ToG), 2021 - dl.acm.org
To the best of our knowledge, we first present a live system that generates personalized
photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system …