Automatic speech recognition using advanced deep learning approaches: A survey
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
Hyporadise: An open baseline for generative speech recognition with large language models
Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …
systems to attain human parity on several publicly available clean speech datasets …
Whispering LLaMA: A cross-modal generative error correction framework for speech recognition
We introduce a new cross-modal fusion technique designed for generative error correction
in automatic speech recognition (ASR). Our methodology leverages both acoustic …
in automatic speech recognition (ASR). Our methodology leverages both acoustic …
Self-taught recognizer: Toward unsupervised adaptation for speech foundation models
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which
leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) …
leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) …
Salm: Speech-augmented language model with in-context learning for speech recognition and translation
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …
Large language model based generative error correction: A challenge and baselines for speech recognition, speaker tagging, and emotion recognition
Given recent advances in generative AI technology, a key question is how large language
models (LLMs) can enhance acoustic modeling tasks using text decoding results from a …
models (LLMs) can enhance acoustic modeling tasks using text decoding results from a …
Large language models are efficient learners of noise-robust speech recognition
Recent advances in large language models (LLMs) have promoted generative error
correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic …
correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic …
Chain-of-Thought Prompting for Speech Translation
Large language models (LLMs) have demonstrated remarkable advancements in language
understanding and generation. Building on the success of text-based LLMs, recent research …
understanding and generation. Building on the success of text-based LLMs, recent research …
Tuning large language model for speech recognition with mixed-scale re-tokenization
Large Language Models (LLMs) have proven successful across a spectrum of speech-
related tasks, such as speech recognition, text-to-speech, and spoken language …
related tasks, such as speech recognition, text-to-speech, and spoken language …
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
This paper presents an efficient decoding approach for end-to-end automatic speech
recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the …
recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the …