Google 학술 검색

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

저장 인용 304회 인용 관련 학술자료 전체 6개의 버전

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

저장 인용 8회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Lip to speech synthesis with visual context attentional GAN

M Kim, J Hong, YM Ro - Advances in Neural Information …, 2021 - proceedings.neurips.cc

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual
Context Attentional GAN (VCA-GAN), which can jointly model local and global lip …

저장 인용 48회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end video-to-speech synthesis using generative adversarial networks

R Mira, K Vougioukas, P Ma, S Petridis… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Video-to-speech is the process of reconstructing the audio speech from a video of a spoken
utterance. Previous approaches to this task have relied on a two-step process where an …

저장 인용 57회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Nautilus: a versatile voice cloning system

HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org

We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …

저장 인용 61회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SVTS: scalable video-to-speech synthesis

R Mira, A Haliassos, S Petridis, BW Schuller… - arxiv preprint arxiv …, 2022 - arxiv.org

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip
movements into the corresponding audio. This task has received an increasing amount of …

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lip-to-speech synthesis in the wild with multi-task learning

M Kim, J Hong, YM Ro - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to
reconstruct speech from visual information alone. However, they have been suffering from …

저장 인용 22회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Lipsound2: Self-supervised pre-training for lip-to-speech reconstruction and lip reading

L Qu, C Weber, S Wermter - IEEE transactions on neural …, 2022 - ieeexplore.ieee.org

The aim of this work is to investigate the impact of crossmodal self-supervised pre-training
for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio …

저장 인용 27회 인용 관련 학술자료 전체 11개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision+ x: A survey on multimodal learning in the light of data

Y Zhu, Y Wu, N Sebe, Y Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

We are perceiving and communicating with the world in a multisensory manner, where
different information sources are sophisticatedly processed and interpreted by separate …

저장 인용 16회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] czhang.org

SpeeChin: A smart necklace for silent speech recognition

R Zhang, M Chen, B Steeper, Y Li, Z Yan… - Proceedings of the …, 2021 - dl.acm.org

This paper presents SpeeChin, a smart necklace that can recognize 54 English and 44
Chinese silent speech commands. A customized infrared (IR) imaging system is mounted on …

저장 인용 26회 인용 관련 학술자료 전체 3개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Vocoder-based speech synthesis from silent videos

An overview of deep-learning-based audio-visual speech enhancement and separation

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

Lip to speech synthesis with visual context attentional GAN

End-to-end video-to-speech synthesis using generative adversarial networks

Nautilus: a versatile voice cloning system

SVTS: scalable video-to-speech synthesis

Lip-to-speech synthesis in the wild with multi-task learning

Lipsound2: Self-supervised pre-training for lip-to-speech reconstruction and lip reading

Vision+ x: A survey on multimodal learning in the light of data

SpeeChin: A smart necklace for silent speech recognition