Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
Deep speaker embeddings for Speaker Verification: Review and experimental comparison
The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …
almost exclusively based on deep neural network-based speaker embeddings. This work …
Styletalk: One-shot talking head generation with controllable speaking styles
Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …
shot talking head methods have made significant progress in lip sync, natural facial …
Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
Samu-xlsr: Semantically-aligned multimodal utterance-level cross-lingual speech representation
We propose the (): S emantically-A ligned M ultimodal U tterance-level Cross-L ingual S
peech R epresentation learning framework. Unlike previous works on speech representation …
peech R epresentation learning framework. Unlike previous works on speech representation …
Dreamtalk: When expressive talking head generation meets diffusion probabilistic models
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …
tasks, yet remain under-explored in the important and challenging expressive talking head …
S2VC: A framework for any-to-any voice conversion with self-supervised pretrained representations
Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any
speakers seen or unseen during training. Various any-to-any VC approaches have been …
speakers seen or unseen during training. Various any-to-any VC approaches have been …
Utilizing self-supervised representations for MOS prediction
Speech quality assessment has been a critical issue in speech processing for decades.
Existing automatic evaluations usually require clean references or parallel ground truth data …
Existing automatic evaluations usually require clean references or parallel ground truth data …
Self-supervised speaker verification using dynamic loss-gate and label correction
For self-supervised speaker verification, the quality of pseudo labels decides the upper
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …
Multi-view self-attention based transformer for speaker recognition
Initially developed for natural language processing (NLP), Transformer model is now widely
used for speech processing tasks such as speaker recognition, due to its powerful sequence …
used for speech processing tasks such as speaker recognition, due to its powerful sequence …