Prompting large language models for zero-shot domain adaptation in speech recognition

Y Li, Y Wu, J Li, S Liu - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …

Adaptable end-to-end ASR models using replaceable internal LMs and residual softmax

K Deng, PC Woodland - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence
distribution of paired audio-transcript training data. However, it still suffers from domain shifts …

Label-synchronous neural transducer for end-to-end ASR

K Deng, PC Woodland - arxiv preprint arxiv:2307.03088, 2023 - arxiv.org
Neural transducers provide a natural approach to streaming ASR. However, they augment
output sequences with blank tokens which leads to challenges for domain adaptation using …

Decoder-only architecture for speech recognition with ctc prompts and text data augmentation

E Tsunoo, H Futami, Y Kashiwagi, S Arora… - arxiv preprint arxiv …, 2023 - arxiv.org
Collecting audio-text pairs is expensive; however, it is much easier to access text-only data.
Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models …

[HTML][HTML] Decoupled structure for improved adaptability of end-to-end models

K Deng, PC Woodland - Speech Communication, 2024 - Elsevier
Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great
success by jointly learning acoustic and linguistic information, it still suffers from the effect of …

Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning

FT Liao, YC Chan, YC Chen, CJ Hsu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this work, we propose a method to create domain-sensitive speech recognition models
that utilize textual domain information by conditioning its generation on a given text prompt …

Label-synchronous neural transducer for adaptable online E2E speech recognition

K Deng, PC Woodland - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art
recognition accuracy, it tends to be implicitly biased towards the training data distribution …

Hybrid Attention-Based Encoder-Decoder Model for Efficient Language Model Adaptation

S Ling, G Ye, R Zhao, Y Gong - 2024 IEEE Spoken Language …, 2024 - ieeexplore.ieee.org
The attention-based encoder-decoder (AED) speech recognition model has been widely
successful in recent years. However, the joint optimization of acoustic model and language …

FastInject: Injecting Unpaired Text Data into CTC-Based ASR Training

K Deng, PC Woodland - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic
speech recognition (ASR) models have achieved impressive results, especially with the …

[PDF][PDF] Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization.

S Seo, JH Kim - Computers, Materials & Continua, 2023 - cdn.techscience.cn
Automatic speech recognition (ASR) systems have emerged as indispensable tools across a
wide spectrum of applications, ranging from transcription services to voice-activated …