Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

In defence of metric learning for speaker recognition

JS Chung, J Huh, S Mun, M Lee, HS Heo… - arxiv preprint arxiv …, 2020 - arxiv.org
The objective of this paper is' open-set'speaker recognition of unseen speakers, where ideal
embeddings should be able to condense information into a compact utterance-level …

Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings

E Cooper, CI Lai, Y Yasuda, F Fang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
While speaker adaptation for end-to-end speech synthesis using speaker embeddings can
produce good speaker similarity for speakers seen during training, there remains a gap for …

A review on speaker recognition: Technology and challenges

RM Hanifa, K Isa, S Mohamad - Computers & Electrical Engineering, 2021 - Elsevier
Voice is a behavioral biometric that conveys information related to a person's traits, such as
the speaker's ethnicity, age, gender, and feeling. Speaker recognition deals with recognizing …

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

R Pappagari, T Wang, J Villalba… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this work, we explore the dependencies between speaker recognition and emotion
recognition. We first show that knowledge learned for speaker recognition can be reused for …

Gender and age estimation methods based on speech using deep neural networks

D Kwasny, D Hemmerling - Sensors, 2021 - mdpi.com
The speech signal contains a vast spectrum of information about the speaker such as
speakers' gender, age, accent, or health state. In this paper, we explored different …

[PDF][PDF] Densely Connected Time Delay Neural Network for Speaker Verification.

YQ Yu, WJ Li - Interspeech, 2020 - cs.nju.edu.cn
Time delay neural network (TDNN) has been widely used in speaker verification tasks.
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …

Voiceid loss: Speech enhancement for speaker verification

S Shon, H Tang, J Glass - arxiv preprint arxiv:1904.03601, 2019 - arxiv.org
In this paper, we propose VoiceID loss, a novel loss function for training a speech
enhancement model to improve the robustness of speaker verification. In contrast to the …