Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

A review on speech separation in cocktail party environment: challenges and approaches

J Agrawal, M Gupta, H Garg - Multimedia Tools and Applications, 2023 - Springer
The Cocktail party problem, which is tracing and identifying a specific speaker's speech
while numerous speakers communicate concurrently is one of the crucial problems still to be …

A comprehensive study of speech separation: spectrogram vs waveform separation

F Bahmaninezhad, J Wu, R Gu, SX Zhang, Y Xu… - ar** to listen at the cocktail party: Text-guided target speaker extraction
X Hao, J Wu, J Yu, C Xu, KC Tan - arxiv preprint arxiv:2310.07284, 2023 - arxiv.org
Humans possess an extraordinary ability to selectively focus on the sound source of interest
amidst complex acoustic environments, commonly referred to as cocktail party scenarios. In …