Speech enhancement and dereverberation with diffusion-based generative models
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …
models for speech enhancement. We present a detailed overview of the diffusion process …
[HTML][HTML] A survey of sound source localization with deep learning methods
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …
localization, with a focus on sound source localization in indoor environments, where …
TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
Imperceptible, robust, and targeted adversarial examples for automatic speech recognition
Adversarial examples are inputs to machine learning models designed by an adversary to
cause an incorrect output. So far, adversarial examples have been studied most extensively …
cause an incorrect output. So far, adversarial examples have been studied most extensively …
A survey on text-dependent and text-independent speaker verification
Speaker verification (SV) aims to detect an individual's identity from his/her voice. SV has
been successfully applied in various areas such as access control, remote service …
been successfully applied in various areas such as access control, remote service …
StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation
Diffusion models have shown a great ability at bridging the performance gap between
predictive and generative approaches for speech enhancement. We have shown that they …
predictive and generative approaches for speech enhancement. We have shown that they …
WHAMR!: Noisy and reverberant single-channel speech separation
While significant advances have been made with respect to the separation of overlap**
speech signals, studies have been largely constrained to mixtures of clean, near anechoic …
speech signals, studies have been largely constrained to mixtures of clean, near anechoic …
Real acoustic fields: An audio-visual room acoustics dataset and benchmark
We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic
room data from multiple modalities. The dataset includes high-quality and densely captured …
room data from multiple modalities. The dataset includes high-quality and densely captured …
Dual-signal transformation LSTM network for real-time noise suppression
This paper introduces a dual-signal transformation LSTM network (DTLN) for real-time
speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge) …
speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge) …
MIMII DUE: Sound dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental …
In this paper, we introduce MIMII DUE, a new dataset for malfunctioning industrial machine
investigation and inspection with domain shifts due to changes in operational and …
investigation and inspection with domain shifts due to changes in operational and …