Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

A Guragain, T Liu, Z Pan, HB Sailor… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

Towards quantifying and reducing language mismatch effects in cross-lingual speech anti-spoofing

T Liu, I Kukanov, Z Pan, Q Wang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
The effects of language mismatch impact speech anti-spoofing systems, while investigations
and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly …

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

J Lin, M Ge, J Ao, L Deng, H Li - arxiv preprint arxiv:2407.02826, 2024 - arxiv.org
It was shown that pre-trained models with self-supervised learning (SSL) techniques are
effective in various downstream speech tasks. However, most such models are trained on …

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning

R Shi, K Itoyama, K Nakadai - arxiv preprint arxiv:2412.20146, 2024 - arxiv.org
This paper addresses the extraction of the bird vocalization embedding from the whole song
level using disentangled representation learning (DRL). Bird vocalization embeddings are …

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

M Tiwari, DK Verma - International Journal of Speech Technology, 2024 - Springer
This research article introduces a novel approach to text-independent speaker recognition
by integrating Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short …

Introducing Euclidean Distance Optimization into Softmax Loss under Neural Collapse

Q Zhang, X Zhang, J Yang, M Sun, T Cao - Pattern Recognition, 2025 - Elsevier
The choice of loss function is crucial in training convolutional neural networks (CNNs). Cross-
entropy loss with Softmax and its variations have demonstrated excellent performance in …