Closing the gap between time-domain multi-channel speech enhancement on real and simulation conditions

W Zhang, J Shi, C Li, S Watanabe… - 2021 IEEE Workshop on …, 2021 - ieeexplore.ieee.org
The deep learning based time-domain models, eg Conv-TasNet, have shown great potential
in both single-channel and multi-channel speech enhancement. However, many …

On loss functions and evaluation metrics for music source separation

E Gusó, J Pons, S Pascual… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We investigate which loss functions provide better separations via benchmarking an
extensive set of those for music source separation. To that end, we first survey the most …

PodcastMix: A dataset for separating music and speech in podcasts

N Schmidt, J Pons, M Miron - arxiv preprint arxiv:2207.07403, 2022 - arxiv.org
We introduce PodcastMix, a dataset formalizing the task of separating background music
and foreground speech in podcasts. We aim at defining a benchmark suitable for training …

Adversarial permutation invariant training for universal sound separation

E Postolache, J Pons, S Pascual… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Universal sound separation consists of separating mixes with arbitrary sounds of different
types, and permutation invariant training (PIT) is used to train source agnostic models that …

Speakeraugment: Data augmentation for generalizable source separation via speaker parameter manipulation

K Wang, Y Yang, H Huang, Y Hu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Existing speech separation models based on deep learning typically generalize poorly due
to domain mismatch. In this paper, we propose SpeakerAugment (SA), a data augmentation …

Mining hard samples locally and globally for improved speech separation

K Wang, Y Peng, H Huang, Y Hu… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Speech separation dataset typically consists of hard and non-hard samples, and the former
is minority and latter majority. The data imbalance problem biases the model towards non …

Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments

H Bae, B Kang, J Kim, J Hwang, H Sung… - arxiv preprint arxiv …, 2025 - arxiv.org
This study emphasizes the significance of exploring distance-based source separation
(DSS) in outdoor environments. Unlike existing studies that primarily focus on indoor …

[LLIBRE][B] Time-domain Deep Neural Networks for Speech Separation

T Sun - 2022 - search.proquest.com
Speech separation separates the speech of interest from background noise (speech
enhancement) or interfering speech (speaker separation). While the human auditory system …

Individualized Conditioning and Negative Distances for Speaker Separation

T Sun, N Abuhajar, S Gong, Z Wang… - 2022 21st IEEE …, 2022 - ieeexplore.ieee.org
Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we
propose two speaker-aware designs to improve the existing speaker separation solutions …

From source separation to compositional music generation

E Postolache - 2024 - iris.uniroma1.it
This thesis proposes a journey into sound processing through deep learning, particularly
generative models, exploring the compositional structure of sound, which is layered in …