SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021‏ - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

F Landini, J Profant, M Diez, L Burget - Computer Speech & Language, 2022‏ - Elsevier
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …

Deep speaker recognition: Process, progress, and challenges

AQ Ohi, MF Mridha, MA Hamid, MM Monowar - IEEE Access, 2021‏ - ieeexplore.ieee.org
Speaker recognition is related to human biometrics dealing with the identification of
speakers from their speech. Speaker recognition is an active research area and being …

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022‏ - ieeexplore.ieee.org
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

ECAPA-TDNN embeddings for speaker diarization

N Dawalatabad, M Ravanelli, F Grondin… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …

Meta-generalization for domain-invariant speaker verification

H Zhang, L Wang, KA Lee, M Liu… - … /ACM Transactions on …, 2023‏ - ieeexplore.ieee.org
Automatic speaker verification (ASV) exhibits unsatisfactory performance under domain
mismatch conditions owing to intrinsic and extrinsic factors, such as variations in speaking …

Combination of deep speaker embeddings for diarisation

G Sun, C Zhang, PC Woodland - Neural Networks, 2021‏ - Elsevier
Significant progress has recently been made in speaker diarisation after the introduction of d-
vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for …

Conversations in the wild: Data collection, automatic generation and evaluation

N Zaheer, AA Raza, M Shabbir - Computer Speech & Language, 2025‏ - Elsevier
The aim of conversational speech processing is to analyze human conversations in natural
settings. It finds numerous applications in personality traits identification, speech therapy …

Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning

K VijayKumar - Data & Knowledge Engineering, 2023‏ - Elsevier
Speaker diarization is the partitioning of an audio source stream into homogeneous
segments according to the speaker's identity. It can improve the readability of an automatic …

序列数据的数据增**方法综述.

葛轶洲, 许翔, 杨锁荣, 周青… - Journal of Frontiers of …, 2021‏ - search.ebscohost.com
为了追求精度, 深度学**模型框架的结构越来越复杂, 网络越来越深. 参数量的增加意味着训练
模型需要更多的数据. 然而人工标注数据的成本是高昂的, 且受客观原因所限 …