Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Multilingual audio-visual speech recognition with hybrid CTC/RNN-T fast conformer

M Burchi, KC Puvvada, J Balam… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Humans are adept at leveraging visual cues from lip movements for recognizing speech in
adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow …