Accelerating rnn-t training and inference using ctc guidance

Y Wang, Z Chen, C Zheng, Y Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose a novel method to accelerate training and inference process of recurrent neural
network transducer (RNN-T) based on the guidance from a co-trained connectionist …

Adapting large language model with speech for fully formatted end-to-end speech recognition

S Ling, Y Hu, S Qian, G Ye, Y Qian… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder
blocks that perform acoustic and language modeling functions. Pretrained large language …

Decoder-only architecture for speech recognition with ctc prompts and text data augmentation

E Tsunoo, H Futami, Y Kashiwagi, S Arora… - ar** in neural transducer
Y Yang, X Yang, L Guo, Z Yao, W Kang… - ar**: Highly Efficient Decoding for Transducers
V Bataev, H Xu, D Galvez, V Lavrukhin… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces a highly efficient greedy decoding algorithm for Transducer inference.
We propose a novel data structure using CUDA tensors to represent partial hypotheses in a …