Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …
the data, which enhances the performance of similarity-based algorithms. This tutorial …
On joint optimization of automatic speaker verification and anti-spoofing in the embedding space
Biometric systems are exposed to spoofing attacks which may compromise their security,
and voice biometrics based on automatic speaker verification (ASV), is no exception. To …
and voice biometrics based on automatic speaker verification (ASV), is no exception. To …
End-to-end speaker verification via curriculum bipartite ranking weighted binary cross-entropy
End-to-end speaker verification achieves the verification through estimating directly the
similarity score between a pair of utterances, which is formulated as a binary (ie, target …
similarity score between a pair of utterances, which is formulated as a binary (ie, target …
Optimizing two-way partial auc with an end-to-end framework
The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which
evaluates the average performance over all possible True Positive Rates (TPRs) and False …
evaluates the average performance over all possible True Positive Rates (TPRs) and False …
Generalizing AUC optimization to multiclass classification for audio segmentation with limited training data
Area under the ROC curve (AUC) optimisation techniques developed for neural networks
have recently demonstrated their capabilities in different audio and speech related tasks …
have recently demonstrated their capabilities in different audio and speech related tasks …
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
Personalized voice triggering is a key technology in voice assistants and serves as the first
step for users to activate the voice assistant. Personalized voice triggering involves keyword …
step for users to activate the voice assistant. Personalized voice triggering involves keyword …
Deep ad-hoc beamforming
XL Zhang - Computer Speech & Language, 2021 - Elsevier
Far-field speech processing is an important and challenging problem. In this paper, we
propose deep ad-hoc beamforming, a deep-learning-based multichannel speech …
propose deep ad-hoc beamforming, a deep-learning-based multichannel speech …
Metrics space and norm: Taxonomy to distance metrics
A lot of machine learning algorithms, including clustering methods such as K‐nearest
neighbor (KNN), highly depend on the distance metrics to understand the data pattern well …
neighbor (KNN), highly depend on the distance metrics to understand the data pattern well …
A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges (with appendices on mathematical …
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …
the data, which enhances the performance of similarity-based algorithms. This tutorial …