Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep learning for biometrics: A survey

K Sundararajan, DL Woodard - ACM Computing Surveys (CSUR), 2018 - dl.acm.org
In the recent past, deep learning methods have demonstrated remarkable success for
supervised learning tasks in multiple domains including computer vision, natural language …

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

X-vectors: Robust dnn embeddings for speaker recognition

D Snyder, D Garcia-Romero, G Sell… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
In this paper, we use data augmentation to improve performance of deep neural network
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …

[PDF][PDF] Deep neural network embeddings for text-independent speaker verification.

D Snyder, D Garcia-Romero, D Povey, S Khudanpur - Interspeech, 2017 - isca-archive.org
This paper investigates replacing i-vectors for text-independent speaker verification with
embeddings extracted from a feedforward deep neural network. Long-term speaker …

Speaker recognition for multi-speaker conversations using x-vectors

D Snyder, D Garcia-Romero, G Sell… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Recently, deep neural networks that map utterances to fixed-dimensional embeddings have
emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors …

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Neural voice cloning with a few samples

S Arik, J Chen, K Peng, W **… - Advances in neural …, 2018 - proceedings.neurips.cc
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio …

Deep speaker: an end-to-end neural speaker embedding system

C Li, X Ma, B Jiang, X Li, X Zhang, X Liu, Y Cao… - arxiv preprint arxiv …, 2017 - arxiv.org
We present Deep Speaker, a neural speaker embedding system that maps utterances to a
hypersphere where speaker similarity is measured by cosine similarity. The embeddings …

Exploring the encoding layer and loss function in end-to-end speaker and language recognition system

W Cai, J Chen, M Li - arxiv preprint arxiv:1804.05160, 2018 - arxiv.org
In this paper, we explore the encoding/pooling layer and loss function in the end-to-end
speaker and language recognition system. First, a unified and interpretable end-to-end …