- Academic Search

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Save Cite Cited by 14 Related articles All 4 versions Free GPT-4

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

A Guragain, T Liu, Z Pan, HB Sailor… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …

Save Cite Cited by 4 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org

The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

Save Cite Cited by 10 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Towards quantifying and reducing language mismatch effects in cross-lingual speech anti-spoofing

T Liu, I Kukanov, Z Pan, Q Wang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

The effects of language mismatch impact speech anti-spoofing systems, while investigations
and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

J Lin, M Ge, J Ao, L Deng, H Li - arxiv preprint arxiv:2407.02826, 2024 - arxiv.org

It was shown that pre-trained models with self-supervised learning (SSL) techniques are
effective in various downstream speech tasks. However, most such models are trained on …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning

R Shi, K Itoyama, K Nakadai - arxiv preprint arxiv:2412.20146, 2024 - arxiv.org

This paper addresses the extraction of the bird vocalization embedding from the whole song
level using disentangled representation learning (DRL). Bird vocalization embeddings are …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

M Tiwari, DK Verma - International Journal of Speech Technology, 2024 - Springer

This research article introduces a novel approach to text-independent speaker recognition
by integrating Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4

Introducing Euclidean Distance Optimization into Softmax Loss under Neural Collapse

Q Zhang, X Zhang, J Yang, M Sun, T Cao - Pattern Recognition, 2025 - Elsevier

The choice of loss function is crucial in training convolutional neural networks (CNNs). Cross-
entropy loss with Softmax and its variations have demonstrated excellent performance in …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

Disentangling voice and content with self-supervision for speaker recognition

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

Prompt-driven target speech diarization

Towards quantifying and reducing language mismatch effects in cross-lingual speech anti-spoofing

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

Introducing Euclidean Distance Optimization into Softmax Loss under Neural Collapse