DPCCN: Densely-connected pyramid complex convolutional network for robust speech separation and extraction

J Han, Y Long, L Burget… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In recent years, a number of time-domain speech separation methods have been proposed.
However, most of them are very sensitive to the environments and wide domain coverage …

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Y Hsu, Y Lee, MR Bai - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-
world applications, speech quality can deteriorate due to, for example, background …

Attention-based scaling adaptation for target speech extraction

J Han, W Rao, Y Long, J Liang - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
The target speech extraction has attracted widespread attention in recent years. In this work,
we focus on investigating the dynamic interaction between different mixtures and the target …

Heterogeneous separation consistency training for adaptation of unsupervised speech separation

J Han, Y Long - EURASIP Journal on Audio, Speech, and Music …, 2023 - Springer
Recently, supervised speech separation has made great progress. However, limited by the
nature of supervised training, most existing separation methods require ground-truth …

Learning-based robust speaker counting and separation with the aid of spatial coherence

Y Hsu, MR Bai - EURASIP Journal on Audio, Speech, and Music …, 2023 - Springer
A three-stage approach is proposed for speaker counting and speech separation in noisy
and reverberant environments. In the spatial feature extraction, a spatial coherence matrix …

Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

Y Hsu, Y Lee, MR Bai - arxiv preprint arxiv:2207.08126, 2022 - arxiv.org
Recently, speech enhancement technologies that are based on deep learning have
received considerable research attention. If the spatial information in microphone signals is …

Array configuration-agnostic personalized speech enhancement using long-short-term spatial coherence

Y Hsu, Y Lee, MR Bai - The Journal of the Acoustical Society of …, 2023 - pubs.aip.org
Personalized speech enhancement (PSE) has been a field of active research for
suppression of speech-like interferers, such as competing speakers or television (TV) …

Spatial-temporal activity-informed diarization and separation

Y Hsu, S Chen, Y Lai, C Wang, MR Bai - The Journal of the Acoustical …, 2025 - pubs.aip.org
A robust multichannel speaker diarization and separation system is proposed by exploiting
the spatiotemporal activity of the speakers. The system is realized in a hybrid architecture …

VocEmb4SVS: Improving singing voice separation with vocal embeddings

C Li, Y Li, X Du, Y Ju, S Hu, Z Wu - 2022 Asia-Pacific Signal …, 2022 - ieeexplore.ieee.org
Deep learning-based methods have shown promising performance on singing voice
separation (SVS). Recently, embeddings related to lyrics and voice activities have been …

Boosting the Performance of SpEx+ by Attention and Contextual Mechanism

C Li, Z Wu, W Rao, Y Wang… - 2022 13th International …, 2022 - ieeexplore.ieee.org
Target speaker extraction (TSE) aims to mimic human selective attention to extracting our
interested voice from the multi-talker environment. Time-domain methods represented by …