Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024‏ - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

**-vector embedding for speaker recognition

KA Lee, Q Wang, T Koshinaka - IEEE Signal Processing Letters, 2021‏ - ieeexplore.ieee.org
We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is
the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate. On the …

SpeakerNet: 1D depth-wise separable convolutional network for text-independent speaker recognition and verification

NR Koluguri, J Li, V Lavrukhin, B Ginsburg - arxiv preprint arxiv …, 2020‏ - arxiv.org
We propose SpeakerNet-a new neural architecture for speaker recognition and speaker
verification tasks. It is composed of residual blocks with 1D depth-wise separable …

Target speaker verification with selective auditory attention for single and multi-talker speech

C Xu, W Rao, J Wu, H Li - IEEE/ACM Transactions on audio …, 2021‏ - ieeexplore.ieee.org
Speaker verification has been studied mostly under the single-talker condition. It is
adversely affected in the presence of interference speakers. Inspired by the study on target …

Understanding self-attention of self-supervised audio transformers

S Yang, AT Liu, H Lee - arxiv preprint arxiv:2006.03265, 2020‏ - arxiv.org
Self-supervised Audio Transformers (SAT) enable great success in many downstream
speech applications like ASR, but how they work has not been widely explored yet. In this …

Robust speaker recognition using speech enhancement and attention model

Y Shi, Q Huang, T Hain - arxiv preprint arxiv:2001.05031, 2020‏ - arxiv.org
In this paper, a novel architecture for speaker recognition is proposed by cascading speech
enhancement and speaker processing. Its aim is to improve speaker recognition …

H-vectors: Utterance-level speaker embedding using a hierarchical attention model

Y Shi, Q Huang, T Hain - ICASSP 2020-2020 IEEE international …, 2020‏ - ieeexplore.ieee.org
In this paper, a hierarchical attention network is proposed to generate utterance-level
embeddings (H-vectors) for speaker identification and verification. Since different parts of an …

A unified deep learning framework for short-duration speaker verification in adverse environments

Y Jung, Y Choi, H Lim, H Kim - IEEE Access, 2020‏ - ieeexplore.ieee.org
Speaker verification (SV) has recently attracted considerable research interest due to the
growing popularity of virtual assistants. At the same time, there is an increasing requirement …

Discriminative speaker embedding with serialized multi-layer multi-head attention

H Zhu, KA Lee, H Li - Speech Communication, 2022‏ - Elsevier
In this paper, a serialized multi-layer multi-head attention is proposed for extracting neural
speaker embedding in text-independent speaker verification task. The majority of the recent …

Combination of deep speaker embeddings for diarisation

G Sun, C Zhang, PC Woodland - Neural Networks, 2021‏ - Elsevier
Significant progress has recently been made in speaker diarisation after the introduction of d-
vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for …