Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021‏ - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep speaker embeddings for speaker verification: Review and experimental comparison

M Jakubec, R Jarina, E Lieskovska, P Kasak - Engineering Applications of …, 2024‏ - Elsevier
The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …

MFA: TDNN with multi-scale frequency-channel attention for text-independent speaker verification with short utterances

T Liu, RK Das, KA Lee, H Li - ICASSP 2022-2022 IEEE …, 2022‏ - ieeexplore.ieee.org
The time delay neural network (TDNN) represents one of the state-of-the-art of neural
solutions to text-independent speaker verification. However, they require a large number of …

Multi-query multi-head attention pooling and inter-topk penalty for speaker verification

M Zhao, Y Ma, Y Ding, Y Zheng… - ICASSP 2022-2022 …, 2022‏ - ieeexplore.ieee.org
This paper describes the multi-query multi-head attention (MQMHA) pooling and inter-topK
penalty methods which were first proposed in our submitted system description for VoxCeleb …

Scoring of large-margin embeddings for speaker verification: Cosine or PLDA?

Q Wang, KA Lee, T Liu - arxiv preprint arxiv:2204.03965, 2022‏ - arxiv.org
The emergence of large-margin softmax cross-entropy losses in training deep speaker
embedding neural networks has triggered a gradual shift from parametric back-ends to a …

Duality temporal-channel-frequency attention enhanced speaker representation learning

L Zhang, Q Wang, L **e - 2021 IEEE Automatic Speech …, 2021‏ - ieeexplore.ieee.org
The use of channel-wise attention in CNN based speaker representation networks has
achieved remarkable performance in speaker verification (SV). But these approaches do …

Cosine Scoring with Uncertainty for Neural Speaker Embedding

Q Wang, KA Lee - IEEE Signal Processing Letters, 2024‏ - ieeexplore.ieee.org
Uncertainty modeling in speaker representation aims to learn the variability present in
speech utterances. While the conventional cosine-scoring is computationally efficient and …

[PDF][PDF] Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network.

Y Wu, L Wang, KA Lee, M Liu, J Dang - Interspeech, 2021‏ - isca-archive.org
Recently, increasing attention has been paid to the joint training of upstream and
downstream tasks, and to address the challenge of how to synchronize various loss …

RSKNet-MTSP: Effective and portable deep architecture for speaker verification

Y Wu, C Guo, J Zhao, X **, J Xu - Neurocomputing, 2022‏ - Elsevier
The convolutional neural network (CNN) based approaches have shown great success for
speaker verification (SV) tasks, where modeling long temporal context and reducing …

Adaptive margin circle loss for speaker verification

R **ao - arxiv preprint arxiv:2106.08004, 2021‏ - arxiv.org
Deep-Neural-Network (DNN) based speaker verification sys-tems use the angular softmax
loss with margin penalties toenhance the intra-class compactness of speaker embeddings …