An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Continuous speech separation: Dataset and analysis

Z Chen, T Yoshioka, L Lu, T Zhou… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Z Zhang, Y Xu, M Yu, SX Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Speech separation algorithms are often used to separate the target speech from other
interfering sources. However, purely neural network based speech separation systems often …

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

[PDF][PDF] Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.

R Gu, L Chen, SX Zhang, J Zheng, Y Xu, M Yu, D Su… - Interspeech, 2019 - isca-archive.org
The recent exploration of deep learning for supervised speech separation has significantly
accelerated the progress on the multi-talker speech separation problem. The multi-channel …

Towards unified all-neural beamforming for time and frequency domain speech separation

R Gu, SX Zhang, Y Zou, D Yu - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Recently, frequency domain all-neural beamforming methods have achieved remarkable
progress for multichannel speech separation. In parallel, the integration of time domain …

A comprehensive study of speech separation: spectrogram vs waveform separation

F Bahmaninezhad, J Wu, R Gu, SX Zhang, Y Xu… - arxiv preprint arxiv …, 2019 - arxiv.org
Speech separation has been studied widely for single-channel close-talk microphone
recordings over the past few years; developed solutions are mostly in frequency-domain …

Advances in online audio-visual meeting transcription

T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

ClearBuds: wireless binaural earbuds for learning-based speech enhancement

I Chatterjee, M Kim, V Jayaram, S Gollakota… - Proceedings of the 20th …, 2022 - dl.acm.org
We present ClearBuds, the first hardware and software system that utilizes a neural network
to enhance speech streamed from two wireless earbuds. Real-time speech enhancement for …