Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation

Y Luo, Z Chen, T Yoshioka - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Recent studies in deep learning-based speech separation have proven the superiority of
time-domain approaches to conventional time-frequency-based methods. Unlike the time …

Librimix: An open-source dataset for generalizable speech separation

J Cosentino, M Pariente, S Cornell, A Deleforge… - arxiv preprint arxiv …, 2020 - arxiv.org
In recent years, wsj0-2mix has become the reference dataset for single-channel speech
separation. Most deep learning-based speech separation models today are benchmarked …

Continuous speech separation: Dataset and analysis

Z Chen, T Yoshioka, L Lu, T Zhou… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …

Asteroid: the PyTorch-based audio source separation toolkit for researchers

M Pariente, S Cornell, J Cosentino… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper describes Asteroid, the PyTorch-based audio source separation toolkit for
researchers. Inspired by the most successful neural source separation systems, it provides …

On loss functions for supervised monaural time-domain speech enhancement

M Kolbæk, ZH Tan, SH Jensen… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Many deep learning-based speech enhancement algorithms are designed to minimize the
mean-square error (MSE) in some transform domain between a predicted and a target …

Improving speaker discrimination of target speech extraction with time-domain speakerbeam

M Delcroix, T Ochiai, K Zmolikova… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Target speech extraction, which extracts a single target source in a mixture given clues
about the target speaker, has attracted increasing attention. We have recently proposed …

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

[PDF][PDF] The Intel neuromorphic DNS challenge

J Timcheck, SB Shrestha, DBD Rubin… - Neuromorphic …, 2023 - iopscience.iop.org
A critical enabler for progress in neuromorphic computing research is the ability to
transparently evaluate different neuromorphic solutions on important tasks and to compare …

A consolidated view of loss functions for supervised deep learning-based speech enhancement

S Braun, I Tashev - 2021 44th International Conference on …, 2021 - ieeexplore.ieee.org
Deep learning-based speech enhancement for real-time applications recently made large
advancements. Due to the lack of a tractable perceptual optimization target, many myths …