[HTML][HTML] A survey of sound source localization with deep learning methods

PA Grumiaux, S Kitić, L Girin, A Guérin - The Journal of the Acoustical …, 2022 - pubs.aip.org
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …

Voice separation with an unknown number of multiple speakers

E Nachmani, Y Adi, L Wolf - International Conference on …, 2020 - proceedings.mlr.press
We present a new method for separating a mixed audio sequence, in which multiple voices
speak simultaneously. The new method employs gated neural networks that are trained to …

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition

N Kanda, J Wu, X Wang, Z Chen, J Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlap** speech captured by a distant microphone array with an arbitrary …

Multi-microphone complex spectral map** for utterance-wise and continuous speech separation

ZQ Wang, P Wang, DL Wang - IEEE/ACM transactions on …, 2021 - ieeexplore.ieee.org
We propose multi-microphone complex spectral map**, a simple way of applying deep
learning for time-varying non-linear beamforming, for speaker separation in reverberant …

gpuRIR: A python library for room impulse response simulation with GPU acceleration

D Diaz-Guerra, A Miguel, JR Beltran - Multimedia Tools and Applications, 2021 - Springer
Abstract The Image Source Method (ISM) is one of the most employed techniques to
calculate acoustic Room Impulse Responses (RIRs), however, its computational complexity …

Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement

A Li, W Liu, C Zheng, X Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Standing upon the intersection of traditional beamformers and deep neural networks, we
propose a causal neural beamformer paradigm called Embedding and Beamforming, and …

Insights into deep non-linear filters for improved multi-channel speech enhancement

K Tesch, T Gerkmann - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
The key advantage of using multiple microphones for speech enhancement is that spatial
filtering can be used to complement the tempo-spectral processing. In a traditional setting …

SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation

C Quan, X Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This work proposes a neural network to extensively exploit spatial information for
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …

Towards unified all-neural beamforming for time and frequency domain speech separation

R Gu, SX Zhang, Y Zou, D Yu - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Recently, frequency domain all-neural beamforming methods have achieved remarkable
progress for multichannel speech separation. In parallel, the integration of time domain …