The ethical implications of generative audio models: A systematic literature review
J Barnett - Proceedings of the 2023 AAAI/ACM Conference on AI …, 2023 - dl.acm.org
Generative audio models typically focus their applications in music and speech generation,
with recent models having human-like quality in their audio output. This paper conducts a …
with recent models having human-like quality in their audio output. This paper conducts a …
Human-computer interaction system: A survey of talking-head generation
Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …
intelligent customer service, and online education, thanks to the rapid development of …
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition
Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …
researchers exploring cross-lingual translation techniques such as machine translation and …
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
In the task of talking face generation the objective is to generate a face video with lips
synchronized to the corresponding audio while preserving visual details and identity …
synchronized to the corresponding audio while preserving visual details and identity …
A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …
source speech to target speech while maintaining translation accuracy. Existing research in …
Transface: Unit-based audio-visual speech synthesizer for talking head translation
Direct speech-to-speech translation achieves high-quality results through the introduction of
discrete units obtained from self-supervised learning. This approach circumvents delays and …
discrete units obtained from self-supervised learning. This approach circumvents delays and …
A Systematic Literature Review: Facial Expression and Lip Movement Synchronization of an Audio Track
MH Alshahrani, MS Maashi - IEEE Access, 2024 - ieeexplore.ieee.org
This systematic literature review (SLR) explores the topic of Facial Expression and Lip
Movement Synchronization of an Audio Track in the context of Automatic Dubbing. This SLR …
Movement Synchronization of an Audio Track in the context of Automatic Dubbing. This SLR …
Talking face generation with audio-deduced emotional landmarks
The goal of talking face generation is to synthesize a sequence of face images of the
specified identity, ensuring the mouth movements are synchronized with the given audio …
specified identity, ensuring the mouth movements are synchronized with the given audio …
Av2av: Direct audio-visual speech to audio-visual speech translation with unified audio-visual speech representation
This paper proposes a novel direct Audio-Visual Speech to Audio-Visual Speech
Translation (AV2AV) framework where the input and output of the system are multimodal (ie …
Translation (AV2AV) framework where the input and output of the system are multimodal (ie …
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Current research in speech-to-speech translation (S2ST) primarily concentrates on
translation accuracy and speech naturalness, often overlooking key elements like …
translation accuracy and speech naturalness, often overlooking key elements like …