Google Академик

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

J Hong, M Kim, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input
corruption situation where audio inputs and visual inputs are both corrupted, which is not …

Сачувај Цитирај 38 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Distinguishing homophenes using multi-head visual-audio memory for lip reading

M Kim, JH Yeo, YM Ro - Proceedings of the AAAI conference on …, 2022 - ojs.aaai.org

Recognizing speech from silent lip movement, which is called lip reading, is a challenging
task due to 1) the inherent information insufficiency of lip movement to fully represent the …

Сачувај Цитирај 57 пута наведен Сродни чланци Све верзије (7) HTML верзија

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

Сачувај Цитирај 8 пута наведен Сродни чланци Све верзије (2)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Сачувај Цитирај 13 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Diffv2s: Diffusion-based video-to-speech synthesis with vision-guided speaker embedding

J Choi, J Hong, YM Ro - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Recent research has demonstrated impressive results in video-to-speech synthesis which
involves reconstructing speech solely from visual input. However, previous works have …

Сачувај Цитирај 12 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SVTS: scalable video-to-speech synthesis

R Mira, A Haliassos, S Petridis, BW Schuller… - arxiv preprint arxiv …, 2022 - arxiv.org

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip
movements into the corresponding audio. This task has received an increasing amount of …

Сачувај Цитирај 33 пута наведен Сродни чланци Све верзије (9) Претрага библиотека HTML верзија

A place for (socio) linguistics in audio deepfake detection and discernment: Opportunities for convergence and interdisciplinary collaboration

C Mallinson, VP Janeja, C Evered… - Language and …, 2024 - Wiley Online Library

Deepfakes, particularly audio deepfakes, have become pervasive and pose unique, ever‐
changing threats to society. This paper reviews the current research landscape on audio …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speaker-adaptive lip reading with user-dependent padding

M Kim, H Kim, YM Ro - European Conference on Computer Vision, 2022 - Springer

Lip reading aims to predict speech based on lip movements alone. As it focuses on visual
information to model the speech, its performance is inherently sensitive to personal lip …

Сачувај Цитирај 24 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lip-to-speech synthesis in the wild with multi-task learning

M Kim, J Hong, YM Ro - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to
reconstruct speech from visual information alone. However, they have been suffering from …

Сачувај Цитирај 22 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Intelligible lip-to-speech synthesis with speech units

J Choi, M Kim, YM Ro - arxiv preprint arxiv:2305.19603, 2023 - arxiv.org

In this paper, we propose a novel Lip-to-Speech synthesis (L2S) framework, for synthesizing
intelligible speech from a silent lip movement video. Specifically, to complement the …

Сачувај Цитирај 17 пута наведен Сродни чланци Све верзије (7) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Speech reconstruction with reminiscent sound via visual voice memory

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

Distinguishing homophenes using multi-head visual-audio memory for lip reading

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

Diffv2s: Diffusion-based video-to-speech synthesis with vision-guided speaker embedding

SVTS: scalable video-to-speech synthesis

A place for (socio) linguistics in audio deepfake detection and discernment: Opportunities for convergence and interdisciplinary collaboration

Speaker-adaptive lip reading with user-dependent padding

Lip-to-speech synthesis in the wild with multi-task learning

Intelligible lip-to-speech synthesis with speech units