A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Audio anti-spoofing detection: A survey

M Li, Y Ahmadiadli, XP Zhang - arxiv preprint arxiv:2404.13914, 2024 - arxiv.org
The availability of smart devices leads to an exponential increase in multimedia content.
However, the rapid advancements in deep learning have given rise to sophisticated …

Betray oneself: A novel audio deepfake detection model via mono-to-stereo conversion

R Liu, J Zhang, G Gao, H Li - arxiv preprint arxiv:2305.16353, 2023 - arxiv.org
Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech
(TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we …

Distance metric-based open-set domain adaptation for speaker verification

J Li, J Han, F Qian, T Zheng, Y He… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Domain shift poses a significant challenge in speaker verification, especially in open-set
scenarios where the speaker categories are disjoint between the source and target …

Graph attention-based deep embedded clustering for speaker diarization

Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …

A Survey on Speech Deepfake Detection

M Li, Y Ahmadiadli, XP Zhang - ACM Computing Surveys, 2025 - dl.acm.org
The availability of smart devices leads to an exponential increase in multimedia content.
However, advancements in deep learning have also enabled the creation of highly …

Speaker verification using attentive multi-scale convolutional recurrent network

Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …

DS-TDNN: Dual-stream time-delay neural network with global-aware filter for speaker verification

Y Li, J Gan, X Lin, Y Qiu, H Zhan… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Conventional time-delay neural networks (TDNNs) struggle to handle long-range context,
their ability to represent speaker information is therefore limited for long utterances. Existing …

[HTML][HTML] Class token and knowledge distillation for multi-head self-attention speaker verification systems

V Mingote, A Miguel, A Ortega, E Lleida - Digital Signal Processing, 2023 - Elsevier
This paper explores three novel approaches to improve the performance of speaker
verification (SV) systems based on deep neural networks (DNN) using Multi-head Self …

Two methods for spoofing-aware speaker verification: Multi-layer perceptron score fusion model and integrated embedding projector

J Heo, J Kim, H Shin - arxiv preprint arxiv:2206.13807, 2022 - arxiv.org
The use of deep neural networks (DNN) has dramatically elevated the performance of
automatic speaker verification (ASV) over the last decade. However, ASV systems can be …