Google Академія

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementatio...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Зберегти Послатися Цитовано в 242 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Зберегти Послатися Цитовано в 430 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science

pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

Зберегти Послатися Цитовано в 146 джерелах Пов’язані статті Кількість версій: 15 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Зберегти Послатися Цитовано в 1871 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Зберегти Послатися Цитовано в 1015 джерелах Пов’язані статті Кількість версій: 20 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Зберегти Послатися Цитовано в 794 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arxiv preprint arxiv:2310.13025, 2023 - arxiv.org

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

Зберегти Послатися Цитовано в 123 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

K-means and alternative clustering methods in modern power systems

SM Miraftabzadeh, CG Colombo, M Longo… - Ieee …, 2023 - ieeexplore.ieee.org

As power systems evolve by integrating renewable energy sources, distributed generation,
and electric vehicles, the complexity of managing these systems increases. With the …

Зберегти Послатися Цитовано в 66 джерелах Пов’язані статті Кількість версій: 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

Зберегти Послатися Цитовано в 126 джерелах Пов’язані статті Кількість версій: 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ECAPA-TDNN embeddings for speaker diarization

N Dawalatabad, M Ravanelli, F Grondin… - arxiv preprint arxiv …, 2021 - arxiv.org

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …

Зберегти Послатися Цитовано в 127 джерелах Пов’язані статті Кількість версій: 10 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementatio...

A review of deep learning techniques for speech processing

A review of speaker diarization: Recent advances with deep learning

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Ego4d: Around the world in 3,000 hours of egocentric video

SpeechBrain: A general-purpose speech toolkit

Powerset multi-class cross entropy loss for neural speaker diarization

K-means and alternative clustering methods in modern power systems

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

ECAPA-TDNN embeddings for speaker diarization