Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep speaker embeddings for Speaker Verification: Review and experimental comparison

M Jakubec, R Jarina, E Lieskovska, P Kasak - Engineering Applications of …, 2024 - Elsevier
The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …

Styletalk: One-shot talking head generation with controllable speaking styles

Y Ma, S Wang, Z Hu, C Fan, T Lv, Y Ding… - Proceedings of the …, 2023 - ojs.aaai.org
Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

Samu-xlsr: Semantically-aligned multimodal utterance-level cross-lingual speech representation

S Khurana, A Laurent, J Glass - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
We propose the (): S emantically-A ligned M ultimodal U tterance-level Cross-L ingual S
peech R epresentation learning framework. Unlike previous works on speech representation …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

S2VC: A framework for any-to-any voice conversion with self-supervised pretrained representations

J Lin, YY Lin, CM Chien, H Lee - arxiv preprint arxiv:2104.02901, 2021 - arxiv.org
Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any
speakers seen or unseen during training. Various any-to-any VC approaches have been …

Utilizing self-supervised representations for MOS prediction

WC Tseng, C Huang, WT Kao, YY Lin, H Lee - arxiv preprint arxiv …, 2021 - arxiv.org
Speech quality assessment has been a critical issue in speech processing for decades.
Existing automatic evaluations usually require clean references or parallel ground truth data …

Self-supervised speaker verification using dynamic loss-gate and label correction

B Han, Z Chen, Y Qian - arxiv preprint arxiv:2208.01928, 2022 - arxiv.org
For self-supervised speaker verification, the quality of pseudo labels decides the upper
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …

Multi-view self-attention based transformer for speaker recognition

R Wang, J Ao, L Zhou, S Liu, Z Wei, T Ko… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Initially developed for natural language processing (NLP), Transformer model is now widely
used for speech processing tasks such as speaker recognition, due to its powerful sequence …