محقق Google

[HTML][HTML] Thank you for attention: A survey on attention-based artificial neural networks for automatic speech recognition‏

P Karmakar, SW Teng, G Lu - Intelligent Systems with Applications, 2024‏ - Elsevier‏

Attention is a very popular and effective mechanism in artificial neural network-based
sequence-to-sequence models. In this survey paper, a comprehensive review of the different …‏

ذخیره ارجاع بیان شده در 39 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Libriheavy: A 50,000 hours ASR corpus with punctuation casing and context‏

W Kang, X Yang, Z Yao, F Kuang… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours
of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is …‏

ذخیره ارجاع بیان شده در 34 یافته مقاله‌های مربوط تمام نسخه‌های 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An embarrassingly simple approach for LLM with strong ASR capacity‏

Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In this paper, we focus on solving one of the most important tasks in the field of speech
processing, ie, automatic speech recognition (ASR), with speech foundation encoders and …‏

ذخیره ارجاع بیان شده در 26 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards universal speech discrete tokens: A case study for asr and tts‏

Y Yang, F Shen, C Du, Z Ma, K Yu… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into
utilizing discrete tokens for speech tasks like recognition and translation, which offer lower …‏

ذخیره ارجاع بیان شده در 27 یافته مقاله‌های مربوط تمام نسخه‌های 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech‏

C Du, Y Guo, H Wang, Y Yang, Z Niu, S Wang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …‏

ذخیره ارجاع بیان شده در 20 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring the capability of mamba in speech applications‏

K Miyazaki, Y Masuyama, M Murata - arxiv preprint arxiv:2406.16808, 2024‏ - arxiv.org‏

This paper explores the capability of Mamba, a recently proposed architecture based on
state space models (SSMs), as a competitive alternative to Transformer-based models. In …‏

ذخیره ارجاع بیان شده در 12 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement‏

Y Yang, Z Song, J Zhuo, M Cui, J Li, B Yang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The evolution of speech technology has been spurred by the rapid increase in dataset sizes.
Traditional speech models generally depend on a large amount of labeled training data …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

PromptASR for contextualized ASR with controllable style‏

X Yang, W Kang, Z Yao, Y Yang, L Guo… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

Prompts are crucial to large language models as they provide context information such as
topic or logical relationships. Inspired by this, we propose PromptASR, a framework that …‏

ذخیره ارجاع بیان شده در 9 یافته مقاله‌های مربوط تمام نسخه‌های 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models‏

W Li, P Yang, Y Zhong, Y Zhou, Z Wang, Z Wu… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization‏

Z **, Y Yang, M Shi, W Kang, X Yang, Z Yao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Zipformer: A faster and better encoder for automatic speech recognition

[HTML][HTML] Thank you for attention: A survey on attention-based artificial neural networks for automatic speech recognition‏

Libriheavy: A 50,000 hours ASR corpus with punctuation casing and context‏

An embarrassingly simple approach for LLM with strong ASR capacity‏

Towards universal speech discrete tokens: A case study for asr and tts‏

Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech‏

Exploring the capability of mamba in speech applications‏

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement‏

PromptASR for contextualized ASR with controllable style‏

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models‏

LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization‏