Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
In defence of metric learning for speaker recognition
The objective of this paper is' open-set'speaker recognition of unseen speakers, where ideal
embeddings should be able to condense information into a compact utterance-level …
embeddings should be able to condense information into a compact utterance-level …
Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings
While speaker adaptation for end-to-end speech synthesis using speaker embeddings can
produce good speaker similarity for speakers seen during training, there remains a gap for …
produce good speaker similarity for speakers seen during training, there remains a gap for …
A review on speaker recognition: Technology and challenges
Voice is a behavioral biometric that conveys information related to a person's traits, such as
the speaker's ethnicity, age, gender, and feeling. Speaker recognition deals with recognizing …
the speaker's ethnicity, age, gender, and feeling. Speaker recognition deals with recognizing …
Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
x-vectors meet emotions: A study on dependencies between emotion and speaker recognition
In this work, we explore the dependencies between speaker recognition and emotion
recognition. We first show that knowledge learned for speaker recognition can be reused for …
recognition. We first show that knowledge learned for speaker recognition can be reused for …
Gender and age estimation methods based on speech using deep neural networks
D Kwasny, D Hemmerling - Sensors, 2021 - mdpi.com
The speech signal contains a vast spectrum of information about the speaker such as
speakers' gender, age, accent, or health state. In this paper, we explored different …
speakers' gender, age, accent, or health state. In this paper, we explored different …
[PDF][PDF] Densely Connected Time Delay Neural Network for Speaker Verification.
Time delay neural network (TDNN) has been widely used in speaker verification tasks.
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …
Voiceid loss: Speech enhancement for speaker verification
In this paper, we propose VoiceID loss, a novel loss function for training a speech
enhancement model to improve the robustness of speaker verification. In contrast to the …
enhancement model to improve the robustness of speaker verification. In contrast to the …