Google Učenjak

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Shrani Navedi Navedeno v 432 virih Sorodni članki Vse različice: 7

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Survey of deep learning paradigms for speech processing

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer

Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

Shrani Navedi Navedeno v 96 virih Sorodni članki Vse različice: 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arxiv preprint arxiv …, 2023 - arxiv.org

The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

Shrani Navedi Navedeno v 59 virih Sorodni članki Vse različice: 7 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X **ao, Z Meng, X Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

Shrani Navedi Navedeno v 63 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Attention-based encoder-decoder end-to-end neural diarization with embedding enhancer

Z Chen, B Han, S Wang, Y Qian - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …

Shrani Navedi Navedeno v 22 virih Sorodni članki Vse različice: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GPU-accelerated guided source separation for meeting transcription

D Raj, D Povey, S Khudanpur - arxiv preprint arxiv:2212.05271, 2022 - arxiv.org

Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …

Shrani Navedi Navedeno v 38 virih Sorodni članki Vse različice: 11 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Notsofar-1 challenge: New datasets, baseline, and tasks for distant meeting transcription

A Vinnikov, A Ivry, A Hurvitz, I Abramovski… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings
(``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge …

Shrani Navedi Navedeno v 21 virih Sorodni članki Vse različice: 5 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

One model to rule them all? towards end-to-end joint speaker diarization and speech recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

Shrani Navedi Navedeno v 19 virih Sorodni članki Vse različice: 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On word error rate definitions and their efficient computation for multi-speaker speech recognition systems

T von Neumann, C Boeddeker… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

We propose a general framework to compute the word error rate (WER) of ASR systems that
process recordings containing multiple speakers at their input and that produce multiple …

Shrani Navedi Navedeno v 30 virih Sorodni članki Vse različice: 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end speaker-attributed ASR with transformer

N Kanda, G Ye, Y Gaur, X Wang, Z Meng… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper presents our recent effort on end-to-end speaker-attributed automatic speech
recognition, which jointly performs speaker counting, speech recognition and speaker …

Shrani Navedi Navedeno v 53 virih Sorodni članki Vse različice: 4 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Integration of speech separation, diarization, and recognition for multi-speaker meetings:...

A review of speaker diarization: Recent advances with deep learning

Survey of deep learning paradigms for speech processing

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

Streaming multi-talker ASR with token-level serialized output training

Attention-based encoder-decoder end-to-end neural diarization with embedding enhancer

GPU-accelerated guided source separation for meeting transcription

Notsofar-1 challenge: New datasets, baseline, and tasks for distant meeting transcription

One model to rule them all? towards end-to-end joint speaker diarization and speech recognition

On word error rate definitions and their efficient computation for multi-speaker speech recognition systems

End-to-end speaker-attributed ASR with transformer