Whispering llama: A cross-modal generative error correction framework for speech recognition

S Radhakrishnan, CHH Yang, SA Khan… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce a new cross-modal fusion technique designed for generative error correction
in automatic speech recognition (ASR). Our methodology leverages both acoustic …

Context-aware transformer transducer for speech recognition

FJ Chang, J Liu, M Radfar, A Mouchtaris… - 2021 IEEE automatic …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty
recognizing uncommon words, that appear infrequently in the training data. One promising …

Low-rank adaptation of large language model rescoring for parameter-efficient speech recognition

Y Yu, CHH Yang, J Kolehmainen… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We propose a neural language modeling system based on low-rank adaptation (LoRA) for
speech recognition output rescoring. Although pretrained language models (LMs) like BERT …

A comparative analysis of automatic speech recognition errors in small group classroom discourse

J Cao, A Ganesh, J Cai, R Southwell… - Proceedings of the 31st …, 2023 - dl.acm.org
In collaborative learning environments, effective intelligent learning systems need to
accurately analyze and understand the collaborative discourse between learners (ie, group …

Personalization strategies for end-to-end speech recognition systems

A Gourav, L Liu, A Gandhe, Y Gu, G Lan… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The recognition of personalized content, such as contact names, remains a challenging
problem for end-to-end speech recognition systems. In this work, we demonstrate how first …

An investigation into an always listening interface to support data exploration

RS Tabalba, N Kirshenbaum, J Leigh… - Proceedings of the 28th …, 2023 - dl.acm.org
Natural Language Interfaces that facilitate data exploration tasks are rapidly gaining in
interest in the research community because they enable users to focus their attention on the …

Vulcan: Automatic Query Planning for Live {ML} Analytics

Y Zhang, X Zhang, G Ananthanarayanan… - … USENIX Symposium on …, 2024 - usenix.org
Live ML analytics have gained increasing popularity with large-scale deployments due to
recent evolution of ML technologies. To serve live ML queries, experts nowadays still need …

Multi-task language modeling for improving speech recognition of rare words

CHH Yang, L Liu, A Gandhe, Y Gu… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
End-to-end automatic speech recognition (ASR) systems are increasingly popular due to
their relative architectural simplicity and competitive performance. However, even though the …

Zero-shot domain-sensitive speech recognition with prompt-conditioning fine-tuning

FT Liao, YC Chan, YC Chen, CJ Hsu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this work, we propose a method to create domain-sensitive speech recognition models
that utilize textual domain information by conditioning its generation on a given text prompt …

Multi-modal retrieval for large language model based speech recognition

A Gourav, J Kolehmainen, P Shivakumar… - Findings of the …, 2024 - aclanthology.org
Retrieval is a widely adopted approach for improving language models leveraging external
information. As the field moves towards multi-modal large language models, it is important to …