Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale

X Wang, H Delgado, H Tak, J Jung, H Shim… - arxiv preprint arxiv …, 2024 - arxiv.org
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech
spoofing and deepfake attacks, and the design of detection solutions. Compared to previous …

An enhanced res2net with local and global feature fusion for speaker verification

Y Chen, S Zheng, H Wang, L Cheng, Q Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
Effective fusion of multi-scale features is crucial for improving speaker verification
performance. While most existing methods aggregate multi-scale features in a layer-wise …

ESPnet-SPK: Full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

J Jung, W Zhang, J Shi, Z Aldeneh, T Higuchi… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training
speaker embedding extractors. First, we provide an open-source platform for researchers in …

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L **e - Speech Communication, 2024 - Elsevier
Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

H Shim, J Jung, T Kinnunen, N Evans… - arxiv preprint arxiv …, 2024 - arxiv.org
Spoofing detection is today a mainstream research topic. Standard metrics can be applied to
evaluate the performance of isolated spoofing detection solutions and others have been …

[PDF][PDF] Branch-ECAPA-TDNN: A parallel branch architecture to capture local and global features for speaker verification

J Yao, C Liang, Z Peng, B Zhang, XL Zhang - Proc. Interspeech, 2023 - xiaolei-zhang.net
Currently, ECAPA-TDNN is one of the state-of-the-art deep models for automatic speaker
verification (ASV). However, it focuses too much on local feature extraction with fixed local …

Voxblink: A large scale speaker verification dataset on camera

Y Lin, X Qin, G Zhao, M Cheng, N Jiang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In this paper, we introduce a large-scale and high-quality audiovisual speaker verification
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …