A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

Speech enhancement using self-adaptation and multi-head self-attention

Y Koizumi, K Yatabe, M Delcroix… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …

Spex+: A complete time domain speaker extraction network

M Ge, C Xu, L Wang, ES Chng, J Dang, H Li - arxiv preprint arxiv …, 2020 - arxiv.org
Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …