Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Insights into deep non-linear filters for improved multi-channel speech enhancement

K Tesch, T Gerkmann - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
The key advantage of using multiple microphones for speech enhancement is that spatial
filtering can be used to complement the tempo-spectral processing. In a traditional setting …

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement

A Li, W Liu, C Zheng, X Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Standing upon the intersection of traditional beamformers and deep neural networks, we
propose a causal neural beamformer paradigm called Embedding and Beamforming, and …

Towards unified all-neural beamforming for time and frequency domain speech separation

R Gu, SX Zhang, Y Zou, D Yu - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Recently, frequency domain all-neural beamforming methods have achieved remarkable
progress for multichannel speech separation. In parallel, the integration of time domain …

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …

Multi-channel speech separation using spatially selective deep non-linear filters

K Tesch, T Gerkmann - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
In a multi-channel separation task with multiple speakers, we aim to recover all individual
speech signals from the mixture. In contrast to single-channel approaches, which rely on the …

Rezero: Region-customizable sound extraction

R Gu, Y Luo - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
We introduce region-customizable sound extraction (ReZero), a general and flexible
framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at …

Complex neural spatial filter: Enhancing multi-channel target speech separation in complex domain

R Gu, SX Zhang, Y Zou, D Yu - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org
To date, mainstream target speech separation (TSS) approaches are formulated to estimate
the complex ratio mask (cRM) of target speech in time-frequency domain under supervised …