Automatic speech recognition using advanced deep learning approaches: A survey

H Kheddar, M Hemis, Y Himeur - Information Fusion, 2024 - Elsevier
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …

Hyporadise: An open baseline for generative speech recognition with large language models

C Chen, Y Hu, CHH Yang… - Advances in …, 2023 - proceedings.neurips.cc
Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …

Whispering LLaMA: A cross-modal generative error correction framework for speech recognition

S Radhakrishnan, CHH Yang, SA Khan… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce a new cross-modal fusion technique designed for generative error correction
in automatic speech recognition (ASR). Our methodology leverages both acoustic …

Self-taught recognizer: Toward unsupervised adaptation for speech foundation models

Y Hu, C Chen, CH Yang, C Qin… - Advances in …, 2025 - proceedings.neurips.cc
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which
leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) …

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Large language model based generative error correction: A challenge and baselines for speech recognition, speaker tagging, and emotion recognition

CHH Yang, T Park, Y Gong, Y Li, Z Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Given recent advances in generative AI technology, a key question is how large language
models (LLMs) can enhance acoustic modeling tasks using text decoding results from a …

Large language models are efficient learners of noise-robust speech recognition

Y Hu, C Chen, CHH Yang, R Li, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in large language models (LLMs) have promoted generative error
correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic …

Chain-of-Thought Prompting for Speech Translation

K Hu, Z Chen, CHH Yang, P Żelasko… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable advancements in language
understanding and generation. Building on the success of text-based LLMs, recent research …

Tuning large language model for speech recognition with mixed-scale re-tokenization

Y Ma, C Zhang, Q Chen, W Wang… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) have proven successful across a spectrum of speech-
related tasks, such as speech recognition, text-to-speech, and spoken language …

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition

T Hori, M Kocour, A Haider, E McDermott… - arxiv preprint arxiv …, 2025 - arxiv.org
This paper presents an efficient decoding approach for end-to-end automatic speech
recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the …