Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Contextualized end-to-end automatic speech recognition with intermediate biasing loss

M Shakeel, Y Sudo, Y Peng, S Watanabe - arxiv preprint arxiv …, 2024‏ - arxiv.org
Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …

Phoneme-aware encoding for prefix-tree-based contextual ASR

H Futami, E Tsunoo, Y Kashiwagi… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
In speech recognition applications, it is important to recognize context-specific rare words,
such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise …

Adapting OpenAI's Whisper for speech recognition on code-switch mandarin-english seame and asru2019 datasets

Y Yang, Y Peng, H Huang, ES Chng… - 2024 Asia Pacific …, 2024‏ - ieeexplore.ieee.org
This paper reports on SOTA results achieved using openAI's Whisper model with adaptation
on different adaptation corpus sizes for two established code-switch Mandarin/English …

Keyword-guided adaptation of automatic speech recognition

A Shamsian, A Navon, N Glazer, G Hetz… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Automatic Speech Recognition (ASR) technology has made significant progress in recent
years, providing accurate transcription across various domains. However, some challenges …

Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text

J Li, Y Pu, Q Sun, WQ Zhang - arxiv preprint arxiv:2408.05554, 2024‏ - arxiv.org
Whisper and other large-scale automatic speech recognition models have made significant
progress in performance. However, their performance on many low-resource languages …

Mai Ho'om\= auna i ka'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian

K Chaparala, G Zarrella, BT Fischer, L Kimura… - arxiv preprint arxiv …, 2024‏ - arxiv.org
In this paper we address the challenge of improving Automatic Speech Recognition (ASR)
for a low-resource language, Hawaiian, by incorporating large amounts of independent text …

Speech-enriched memory for inference-time adaptation of asr models to word dictionaries

A Mittal, S Sarawagi, P Jyothi, G Saon… - Proceedings of the …, 2023‏ - aclanthology.org
Despite the impressive performance of ASR models on mainstream benchmarks, their
performance on rare words is unsatisfactory. In enterprise settings, often a focused list of …

[PDF][PDF] Contextual Biasing Speech Recognition in Speech-enhanced Large Language Model

X Gong, A Lv, Z Wang, Y Qian - Proc. Interspeech. ISCA, 2024‏ - isca-archive.org
Recently, the rapid advancements in audio-and speechenhanced large language models
(SpeechLLMs), such as Qwen-Audio and SALMONN, have significantly propelled automatic …

Enhancing quantised end-to-end asr models via personalisation

Q Zhao, G Sun, C Zhang, M Xu… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
Recent end-to-end automatic speech recognition (ASR) models have become increasingly
larger, making them particularly challenging to be deployed on resource-constrained …