A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Neural target speech extraction: An overview
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …
Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking
Q Wang, H Muckenhirn, K Wilson, P Sridhar… - ar** speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …
challenging problems with regards to today's automatic speech recognition systems …
Spex: Multi-scale time domain speaker extraction network
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
Speech enhancement using self-adaptation and multi-head self-attention
This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …
speaker-aware features; we extract a speaker representation used for adaptation directly …
Spex+: A complete time domain speaker extraction network
Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …
given a target speaker's reference speech. We recently proposed a time-domain solution …