[HTML][HTML] A survey of sound source localization with deep learning methods
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …
localization, with a focus on sound source localization in indoor environments, where …
Voice separation with an unknown number of multiple speakers
We present a new method for separating a mixed audio sequence, in which multiple voices
speak simultaneously. The new method employs gated neural networks that are trained to …
speak simultaneously. The new method employs gated neural networks that are trained to …
TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition
This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlap** speech captured by a distant microphone array with an arbitrary …
multi-talker overlap** speech captured by a distant microphone array with an arbitrary …
Multi-microphone complex spectral map** for utterance-wise and continuous speech separation
We propose multi-microphone complex spectral map**, a simple way of applying deep
learning for time-varying non-linear beamforming, for speaker separation in reverberant …
learning for time-varying non-linear beamforming, for speaker separation in reverberant …
gpuRIR: A python library for room impulse response simulation with GPU acceleration
Abstract The Image Source Method (ISM) is one of the most employed techniques to
calculate acoustic Room Impulse Responses (RIRs), however, its computational complexity …
calculate acoustic Room Impulse Responses (RIRs), however, its computational complexity …
Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement
Standing upon the intersection of traditional beamformers and deep neural networks, we
propose a causal neural beamformer paradigm called Embedding and Beamforming, and …
propose a causal neural beamformer paradigm called Embedding and Beamforming, and …
Insights into deep non-linear filters for improved multi-channel speech enhancement
K Tesch, T Gerkmann - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
The key advantage of using multiple microphones for speech enhancement is that spatial
filtering can be used to complement the tempo-spectral processing. In a traditional setting …
filtering can be used to complement the tempo-spectral processing. In a traditional setting …
SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation
This work proposes a neural network to extensively exploit spatial information for
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …
Towards unified all-neural beamforming for time and frequency domain speech separation
Recently, frequency domain all-neural beamforming methods have achieved remarkable
progress for multichannel speech separation. In parallel, the integration of time domain …
progress for multichannel speech separation. In parallel, the integration of time domain …