- Academic Search

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Simpan Kutip Dirujuk 3882 kali Artikel terkait 11 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Espnet-slu: Advancing spoken language understanding through espnet

S Arora, S Dalmia, P Denisov, X Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

As Automatic Speech Processing (ASR) systems are getting better, there is an increasing
interest of using the ASR output to do downstream Natural Language Processing (NLP) …

Simpan Kutip Dirujuk 81 kali Artikel terkait 7 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fast conformer with linearly scalable attention for efficient speech recognition

D Rekesh, NR Koluguri, S Kriman… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Conformer-based models have become the dominant end-to-end architecture for speech
processing tasks. With the objective of enhancing the conformer architecture for efficient …

Simpan Kutip Dirujuk 81 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobench: A universal benchmark for audio large language models

B Wang, X Zou, G Lin, S Sun, Z Liu, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce AudioBench, a universal benchmark designed to evaluate Audio Large
Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among …

Simpan Kutip Dirujuk 18 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less is more: Accurate speech recognition & translation without web-scale data

KC Puvvada, P Żelasko, H Huang, O Hrinchuk… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advances in speech recognition and translation rely on hundreds of thousands of
hours of Internet speech data. We argue that state-of-the art accuracy can be reached …

Simpan Kutip Dirujuk 12 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

Y Peng, S Arora, Y Higuchi, Y Ueda… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …

Simpan Kutip Dirujuk 26 kali Artikel terkait 5 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VarArray: Array-geometry-agnostic continuous speech separation

T Yoshioka, X Wang, D Wang, M Tang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Continuous speech separation using a microphone array was shown to be promising in
dealing with the speech overlap problem in natural conversation transcription. This paper …

Simpan Kutip Dirujuk 36 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Token-level sequence labeling for spoken language understanding using compositional end-to-end models

S Arora, S Dalmia, B Yan, F Metze, AW Black… - arxiv preprint arxiv …, 2022 - arxiv.org

End-to-end spoken language understanding (SLU) systems are gaining popularity over
cascaded approaches due to their simplicity and ability to avoid error propagation. However …

Simpan Kutip Dirujuk 18 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Residual language model for end-to-end speech recognition

E Tsunoo, Y Kashiwagi, C Narisetty… - arxiv preprint arxiv …, 2022 - arxiv.org

End-to-end automatic speech recognition suffers from adaptation to unknown target domain
speech despite being trained with a large amount of paired audio--text data. Recent studies …

Simpan Kutip Dirujuk 14 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving contextual recognition of rare words with an alternate spelling prediction model

JD Fox, N Delworth - arxiv preprint arxiv:2209.01250, 2022 - arxiv.org

Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent
interest as ASR use becomes more widespread. We are releasing contextual biasing lists to …

Simpan Kutip Dirujuk 20 kali Artikel terkait 4 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Earnings-21: A practical benchmark for ASR in the wild

Robust speech recognition via large-scale weak supervision

Espnet-slu: Advancing spoken language understanding through espnet

Fast conformer with linearly scalable attention for efficient speech recognition

Audiobench: A universal benchmark for audio large language models

Less is more: Accurate speech recognition & translation without web-scale data

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

VarArray: Array-geometry-agnostic continuous speech separation

Token-level sequence labeling for spoken language understanding using compositional end-to-end models

Residual language model for end-to-end speech recognition

Improving contextual recognition of rare words with an alternate spelling prediction model