- Academic Search

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Gem Citer Citeret af 304 Relaterede artikler Alle 6 versioner

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

Gem Citer Citeret af 8 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

WN Hsu, T Remez, B Shi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …

Gem Citer Citeret af 18 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Interspeech 2022 audio deep packet loss concealment challenge

L Diener, S Sootla, S Branets, A Saabas… - arxiv preprint arxiv …, 2022 - arxiv.org

Audio Packet Loss Concealment (PLC) is the hiding of gaps in audio streams caused by
data transmission failures in packet switched networks. This is a common problem, and of …

Gem Citer Citeret af 45 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Can audio-visual integration strengthen robustness under multimodal attacks?

Y Tian, C Xu - Proceedings of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com

In this paper, we propose to make a systematic study on machines' multisensory perception
under attacks. We use the audio-visual event recognition task against multimodal …

Gem Citer Citeret af 44 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechpainter: Text-conditioned speech inpainting

Z Borsos, M Sharifi, M Tagliasacchi - arxiv preprint arxiv:2202.07273, 2022 - arxiv.org

We propose SpeechPainter, a model for filling in gaps of up to one second in speech
samples by leveraging an auxiliary textual input. We demonstrate that the model performs …

Gem Citer Citeret af 32 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

F Miotello, M Pezzoli, L Comanducci… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In this manuscript, we propose a novel method to perform audio inpainting, ie, the
restoration of audio signals presenting multiple missing parts. Audio inpainting can be …

Gem Citer Citeret af 11 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffusion-based audio inpainting

E Moliner, V Välimäki - arxiv preprint arxiv:2305.15266, 2023 - arxiv.org

Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of
existing methods produce plausible reconstructions when the gap lengths are short, but …

Gem Citer Citeret af 16 Relaterede artikler Alle 7 versioner Vis som HTML

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

S Ghosh, S Sarkar, S Ghosh, F Zalkow, ND Jana - Applied Intelligence, 2024 - Springer

Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …

Gem Citer Citeret af 4 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech enhancement

WN Hsu, T Remez, B Shi, J Donley, Y Adi - arxiv preprint arxiv …, 2022 - arxiv.org

Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …

Gem Citer Citeret af 14 Relaterede artikler Alle 4 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Audio-visual speech inpainting with deep learning

An overview of deep-learning-based audio-visual speech enhancement and separation

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

Interspeech 2022 audio deep packet loss concealment challenge

Can audio-visual integration strengthen robustness under multimodal attacks?

Speechpainter: Text-conditioned speech inpainting

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

Diffusion-based audio inpainting

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech enhancement