- Academic Search

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

保存引用被引用数: 420 関連記事全 7 バージョン

[Free GPT-4]

[PDF] arxiv.org

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

保存引用被引用数: 203 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arxiv preprint arxiv …, 2020 - arxiv.org

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

保存引用被引用数: 362 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Speechstew: Simply mix all available speech recognition data to train one large neural network

W Chan, D Park, C Lee, Y Zhang, Q Le… - arxiv preprint arxiv …, 2021 - arxiv.org

We present SpeechStew, a speech recognition model that is trained on a combination of
various publicly available speech recognition datasets: AMI, Broadcast News, Common …

保存引用被引用数: 165 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

保存引用被引用数: 292 関連記事全 7 バージョン

[Free GPT-4]

[PDF] arxiv.org

End-to-end neural speaker diarization with permutation-free objectives

Y Fujita, N Kanda, S Horiguchi, K Nagamatsu… - arxiv preprint arxiv …, 2019 - arxiv.org

In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …

保存引用被引用数: 289 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arxiv preprint arxiv …, 2023 - arxiv.org

The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

保存引用被引用数: 58 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

保存引用被引用数: 121 関連記事全 8 バージョン

[Free GPT-4]

[PDF] arxiv.org

Rethinking evaluation in asr: Are our models robust enough?

T Likhomanenko, Q Xu, V Pratap, P Tomasello… - arxiv preprint arxiv …, 2020 - arxiv.org

Is pushing numbers on a single benchmark valuable in automatic speech recognition?
Research results in acoustic modeling are typically evaluated based on performance on a …

保存引用被引用数: 106 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

D Raj, P Denisov, Z Chen, H Erdogan… - 2021 IEEE spoken …, 2021 - ieeexplore.ieee.org

Multi-speaker speech recognition of unsegmented recordings has diverse applications such
as meeting transcription and automatic subtitle generation. With technical advances in …

保存引用被引用数: 101 関連記事全 8 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Front-end processing for the CHiME-5 dinner party scenario

A review of speaker diarization: Recent advances with deep learning

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

Speechstew: Simply mix all available speech recognition data to train one large neural network

End-to-end neural speaker diarization with self-attention

End-to-end neural speaker diarization with permutation-free objectives

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

Far-field automatic speech recognition

Rethinking evaluation in asr: Are our models robust enough?

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis