الباحث العلمي من Google

Leveraging acoustic contextual representation by audio-textual cross-modal learning for conversat...

X Gong, Y Wu, J Li, S Liu, R Zhao… - … /ACM Transactions on …, 2024‏ - ieeexplore.ieee.org‏

Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]

[PDF] arxiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation‏

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024‏ - ieeexplore.ieee.org‏

Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …‏

حفظ اقتباس تم اقتباسها في عدد: 4 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]

[PDF] arxiv.org

Towards effective and compact contextual representation for conformer transducer speech recognition systems‏

M Cui, J Kang, J Deng, X Yin, Y **e, X Chen… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …‏

حفظ اقتباس تم اقتباسها في عدد: 8 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Longfnt: Long-form speech recognition with factorized neural transducer‏

X Gong, Y Wu, J Li, S Liu, R Zhao… - ICASSP 2023-2023 …, 2023‏ - ieeexplore.ieee.org‏

Traditional automatic speech recognition (ASR) systems usually focus on individual
utterances, without considering long-form speech with useful historical information, which is …‏

حفظ اقتباس تم اقتباسها في عدد: 9 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]

[PDF] arxiv.org

Context-aware fine-tuning of self-supervised speech models‏

S Shon, F Wu, K Kim, P Sridhar… - ICASSP 2023-2023 …, 2023‏ - ieeexplore.ieee.org‏

Self-supervised pre-trained transformers have improved the state of the art on a variety of
speech tasks. Due to the quadratic time and space complexity of self-attention, they usually …‏

حفظ اقتباس تم اقتباسها في عدد: 7 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]

[PDF] arxiv.org

Updated Corpora and Benchmarks for Long-Form Speech Recognition‏

JD Fox, D Raj, N Delworth, Q McNamara… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

The vast majority of ASR research uses corpora in which both the training and test data have
been pre-segmented into utterances. In most real-word ASR use-cases, however, test audio …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]

[PDF] arxiv.org

Efficient Long-Form Speech Recognition for General Speech In-Context Learning‏

H Yen, S Ling, G Ye - arxiv preprint arxiv:2409.19757, 2024‏ - arxiv.org‏

We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve
efficient speech in-context learning (SICL) for (i) long-form speech decoding,(ii) test-time …‏

حفظ اقتباس مقالات ذات صلة إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR‏

M Cui, Y Yang, J Deng, J Kang, S Hu, T Wang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Self-supervised learning (SSL) based discrete speech representations are highly compact
and domain adaptable. In this paper, SSL discrete speech features extracted from WavLM …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models‏

S Shon, K Kim, P Sridhar, YT Hsu… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

When performing tasks like automatic speech recognition or spoken language
understanding for a given utterance, access to preceding text or audio provides contextual …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 3كلها

4 Cross-Modal Generation of Visual and Auditory‏

F Gao, M Liu, Y Zhou - Artificial Intelligence for Art Creation and …, 2024‏ - books.google.com‏

With the breakthrough progress of generative models in the field of AI painting, AIGC has
attracted widespread attention and become one of the hottest research directions driving the …‏

حفظ اقتباس مقالات ذات صلة

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Leveraging acoustic contextual representation by audio-textual cross-modal learning for conversat...

Advanced long-content speech recognition with factorized neural transducer‏

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation‏

Towards effective and compact contextual representation for conformer transducer speech recognition systems‏

Longfnt: Long-form speech recognition with factorized neural transducer‏

Context-aware fine-tuning of self-supervised speech models‏

Updated Corpora and Benchmarks for Long-Form Speech Recognition‏

Efficient Long-Form Speech Recognition for General Speech In-Context Learning‏

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR‏

Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models‏

4 Cross-Modal Generation of Visual and Auditory‏