- Academic Search

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

The ethical implications of generative audio models: A systematic literature review

J Barnett - Proceedings of the 2023 AAAI/ACM Conference on AI …, 2023 - dl.acm.org

Generative audio models typically focus their applications in music and speech generation,
with recent models having human-like quality in their audio output. This paper conducts a …

บันทึก อ้างอิง อ้างโดย36 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[免费ChatGPT] [DeepSeek可用网址] [PDF] mdpi.com

Human-computer interaction system: A survey of talking-head generation

R Zhen, W Song, Q He, J Cao, L Shi, J Luo - Electronics, 2023 - mdpi.com

Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …

บันทึก อ้างอิง อ้างโดย51 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ แคช

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

X Cheng, T **, R Huang, L Li, W Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …

บันทึก อ้างอิง อ้างโดย24 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

D Yaman, FI Eyiokur, L Bärmann… - … Of The IEEE/CVF …, 2024 - openaccess.thecvf.com

In the task of talking face generation the objective is to generate a face video with lips
synchronized to the corresponding audio while preserving visual details and identity …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation

WC Huang, B Peloquin, J Kao, C Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …

บันทึก อ้างอิง อ้างโดย19 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Transface: Unit-based audio-visual speech synthesizer for talking head translation

X Cheng, R Huang, L Li, T **, Z Wang, A Yin… - arxiv preprint arxiv …, 2023 - arxiv.org

Direct speech-to-speech translation achieves high-quality results through the introduction of
discrete units obtained from self-supervised learning. This approach circumvents delays and …

บันทึก อ้างอิง อ้างโดย7 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] ieee.org

A Systematic Literature Review: Facial Expression and Lip Movement Synchronization of an Audio Track

MH Alshahrani, MS Maashi - IEEE Access, 2024 - ieeexplore.ieee.org

This systematic literature review (SLR) explores the topic of Facial Expression and Lip
Movement Synchronization of an Audio Track in the context of Automatic Dubbing. This SLR …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ

Talking face generation with audio-deduced emotional landmarks

S Zhai, M Liu, Y Li, Z Gao, L Zhu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The goal of talking face generation is to synthesize a sequence of face images of the
specified identity, ensuring the mouth movements are synchronized with the given audio …

บันทึก อ้างอิง อ้างโดย6 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Av2av: Direct audio-visual speech to audio-visual speech translation with unified audio-visual speech representation

J Choi, SJ Park, M Kim, YM Ro - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

This paper proposes a novel direct Audio-Visual Speech to Audio-Visual Speech
Translation (AV2AV) framework where the input and output of the system are multimodal (ie …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation

A Min, C Hu, Y Ren, H Zhao - arxiv preprint arxiv:2502.00374, 2025 - arxiv.org

Current research in speech-to-speech translation (S2ST) primarily concentrates on
translation accuracy and speech naturalness, often overlooking key elements like …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

The ethical implications of generative audio models: A systematic literature review

Human-computer interaction system: A survey of talking-head generation

Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation

Transface: Unit-based audio-visual speech synthesizer for talking head translation

A Systematic Literature Review: Facial Expression and Lip Movement Synchronization of an Audio Track

Talking face generation with audio-deduced emotional landmarks

Av2av: Direct audio-visual speech to audio-visual speech translation with unified audio-visual speech representation

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation