Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023‏ - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

[HTML][HTML] A survey of sound source localization with deep learning methods

PA Grumiaux, S Kitić, L Girin, A Guérin - The Journal of the Acoustical …, 2022‏ - pubs.aip.org
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023‏ - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

Imperceptible, robust, and targeted adversarial examples for automatic speech recognition

Y Qin, N Carlini, G Cottrell… - … on machine learning, 2019‏ - proceedings.mlr.press
Adversarial examples are inputs to machine learning models designed by an adversary to
cause an incorrect output. So far, adversarial examples have been studied most extensively …

A survey on text-dependent and text-independent speaker verification

Y Tu, W Lin, MW Mak - IEEE Access, 2022‏ - ieeexplore.ieee.org
Speaker verification (SV) aims to detect an individual's identity from his/her voice. SV has
been successfully applied in various areas such as access control, remote service …

StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation

JM Lemercier, J Richter, S Welker… - … /ACM Transactions on …, 2023‏ - ieeexplore.ieee.org
Diffusion models have shown a great ability at bridging the performance gap between
predictive and generative approaches for speech enhancement. We have shown that they …

WHAMR!: Noisy and reverberant single-channel speech separation

M Maciejewski, G Wichern, E McQuinn… - ICASSP 2020-2020 …, 2020‏ - ieeexplore.ieee.org
While significant advances have been made with respect to the separation of overlap**
speech signals, studies have been largely constrained to mixtures of clean, near anechoic …

Real acoustic fields: An audio-visual room acoustics dataset and benchmark

Z Chen, ID Gebru, C Richardt… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic
room data from multiple modalities. The dataset includes high-quality and densely captured …

Dual-signal transformation LSTM network for real-time noise suppression

NL Westhausen, BT Meyer - arxiv preprint arxiv:2005.07551, 2020‏ - arxiv.org
This paper introduces a dual-signal transformation LSTM network (DTLN) for real-time
speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge) …

MIMII DUE: Sound dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental …

R Tanabe, H Purohit, K Dohi, T Endo… - … IEEE Workshop on …, 2021‏ - ieeexplore.ieee.org
In this paper, we introduce MIMII DUE, a new dataset for malfunctioning industrial machine
investigation and inspection with domain shifts due to changes in operational and …