Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023‏ - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Speaker anonymization using orthogonal householder neural network

X Miao, X Wang, E Cooper, J Yamagishi… - … on Audio, Speech …, 2023‏ - ieeexplore.ieee.org
Speaker anonymization aims to conceal a speaker's identity while preserving content
information in speech. Current mainstream neural-network speaker anonymization systems …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024‏ - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Language-independent speaker anonymization approach using self-supervised pre-trained models

X Miao, X Wang, E Cooper, J Yamagishi… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Speaker anonymization aims to protect the privacy of speakers while preserving spoken
linguistic information from speech. Current mainstream neural network speaker …

Multi-level attention network: Mixed time–frequency channel attention and multi-scale self-attentive standard deviation pooling for speaker recognition

L Deng, F Deng, K Zhou, P Jiang, G Zhang… - … Applications of Artificial …, 2024‏ - Elsevier
In this paper, we propose a more efficient lightweight speaker recognition network, the multi-
level attention network (MANet). MANet aims to generate more robust and discriminative …

RSKNet-MTSP: Effective and portable deep architecture for speaker verification

Y Wu, C Guo, J Zhao, X **, J Xu - Neurocomputing, 2022‏ - Elsevier
The convolutional neural network (CNN) based approaches have shown great success for
speaker verification (SV) tasks, where modeling long temporal context and reducing …

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

X Miao, Y Zhang, X Wang, N Tomashenko… - arxiv preprint arxiv …, 2024‏ - arxiv.org
A general disentanglement-based speaker anonymization system typically separates
speech into content, speaker, and prosody features using individual encoders. This paper …

ResSKNet-SSDP: effective and light end-to-end architecture for speaker recognition

F Deng, L Deng, P Jiang, G Zhang, Q Yang - Sensors, 2023‏ - mdpi.com
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have
shown significant success. Modeling the long-term contexts and efficiently aggregating the …

[HTML][HTML] Explore long-range context features for speaker verification

Z Li, Z Zhao, W Wang, P Zhang, Q Zhao - Applied Sciences, 2023‏ - mdpi.com
Multi-scale context information, especially long-range dependency, has shown to be
beneficial for speaker verification (SV) tasks. In this paper, we propose three methods to …

Multimodal modeling for spoken language identification

S Bharadwaj, M Ma, S Vashishth… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
Spoken language identification refers to the task of automatically predicting the spoken
language in a given utterance. Conventionally, it is modeled as a speech-based language …