Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges

JL Suárez, S García, F Herrera - Neurocomputing, 2021 - Elsevier
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …

On joint optimization of automatic speaker verification and anti-spoofing in the embedding space

A Gomez-Alanis, JA Gonzalez-Lopez… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
Biometric systems are exposed to spoofing attacks which may compromise their security,
and voice biometrics based on automatic speaker verification (ASV), is no exception. To …

End-to-end speaker verification via curriculum bipartite ranking weighted binary cross-entropy

Z Bai, J Wang, XL Zhang, J Chen - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
End-to-end speaker verification achieves the verification through estimating directly the
similarity score between a pair of utterances, which is formulated as a binary (ie, target …

Optimizing two-way partial auc with an end-to-end framework

Z Yang, Q Xu, S Bao, Y He, X Cao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which
evaluates the average performance over all possible True Positive Rates (TPRs) and False …

Generalizing AUC optimization to multiclass classification for audio segmentation with limited training data

P Gimeno, V Mingote, A Ortega… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Area under the ROC curve (AUC) optimisation techniques developed for neural networks
have recently demonstrated their capabilities in different audio and speech related tasks …

Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting

X Liang, Z Zhang, R Xu - EURASIP Journal on Audio, Speech, and Music …, 2023 - Springer
Personalized voice triggering is a key technology in voice assistants and serves as the first
step for users to activate the voice assistant. Personalized voice triggering involves keyword …

Deep ad-hoc beamforming

XL Zhang - Computer Speech & Language, 2021 - Elsevier
Far-field speech processing is an important and challenging problem. In this paper, we
propose deep ad-hoc beamforming, a deep-learning-based multichannel speech …

Metrics space and norm: Taxonomy to distance metrics

B Subramanian, A Paul, J Kim… - Scientific …, 2022 - Wiley Online Library
A lot of machine learning algorithms, including clustering methods such as K‐nearest
neighbor (KNN), highly depend on the distance metrics to understand the data pattern well …

A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges (with appendices on mathematical …

JL Suárez-Díaz, S García, F Herrera - arxiv preprint arxiv:1812.05944, 2018 - arxiv.org
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …