Prompting large language models for zero-shot domain adaptation in speech recognition
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …
domain shifts in speech recognition. However, these approaches usually require a …
Adaptable end-to-end ASR models using replaceable internal LMs and residual softmax
End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence
distribution of paired audio-transcript training data. However, it still suffers from domain shifts …
distribution of paired audio-transcript training data. However, it still suffers from domain shifts …
Label-synchronous neural transducer for end-to-end ASR
Neural transducers provide a natural approach to streaming ASR. However, they augment
output sequences with blank tokens which leads to challenges for domain adaptation using …
output sequences with blank tokens which leads to challenges for domain adaptation using …
Decoder-only architecture for speech recognition with ctc prompts and text data augmentation
Collecting audio-text pairs is expensive; however, it is much easier to access text-only data.
Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models …
Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models …
[HTML][HTML] Decoupled structure for improved adaptability of end-to-end models
Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great
success by jointly learning acoustic and linguistic information, it still suffers from the effect of …
success by jointly learning acoustic and linguistic information, it still suffers from the effect of …
Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning
In this work, we propose a method to create domain-sensitive speech recognition models
that utilize textual domain information by conditioning its generation on a given text prompt …
that utilize textual domain information by conditioning its generation on a given text prompt …
Label-synchronous neural transducer for adaptable online E2E speech recognition
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art
recognition accuracy, it tends to be implicitly biased towards the training data distribution …
recognition accuracy, it tends to be implicitly biased towards the training data distribution …
Hybrid Attention-Based Encoder-Decoder Model for Efficient Language Model Adaptation
The attention-based encoder-decoder (AED) speech recognition model has been widely
successful in recent years. However, the joint optimization of acoustic model and language …
successful in recent years. However, the joint optimization of acoustic model and language …
FastInject: Injecting Unpaired Text Data into CTC-Based ASR Training
Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic
speech recognition (ASR) models have achieved impressive results, especially with the …
speech recognition (ASR) models have achieved impressive results, especially with the …
[PDF][PDF] Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization.
Automatic speech recognition (ASR) systems have emerged as indispensable tools across a
wide spectrum of applications, ranging from transcription services to voice-activated …
wide spectrum of applications, ranging from transcription services to voice-activated …