Foundation transformers

H Wang, S Ma, S Huang, L Dong, W Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
A big convergence of model architectures across language, vision, speech, and multimodal
is emerging. However, under the same name" Transformers", the above areas use different …

Magneto: A foundation transformer

H Wang, S Ma, S Huang, L Dong… - International …, 2023 - proceedings.mlr.press
A big convergence of model architectures across language, vision, speech, and multimodal
is emerging. However, under the same name” Transformers”, the above areas use different …

BERT meets CTC: New formulation of end-to-end speech recognition with pre-trained masked language model

Y Higuchi, B Yan, S Arora, T Ogawa… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that
adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the …

CCE-Net: Causal Convolution Embedding Network for Streaming Automatic Speech Recognition

F Deng, Y Ming, B Lyu - International Journal of Network Dynamics and …, 2024 - sciltp.com
Streaming Automatic Speech Recognition (ASR) has gained significant attention across
various application scenarios, including video conferencing, live sports events, and …

Streaming end-to-end target-speaker automatic speech recognition and activity detection

T Moriya, H Sato, T Ochiai, M Delcroix… - IEEE Access, 2023 - ieeexplore.ieee.org
Automatic speech recognition of a target speaker in the presence of interfering speakers
remains a challenging issue. One approach to tackle this problem is target-speaker speech …

Bectra: Transducer-based end-to-end asr with bert-enhanced encoder

Y Higuchi, T Ogawa, T Kobayashi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech
recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …

Memory-efficient training of RNN-Transducer with sampled softmax

J Lee, L Lee, S Watanabe - arxiv preprint arxiv:2203.16868, 2022 - arxiv.org
RNN-Transducer has been one of promising architectures for end-to-end automatic speech
recognition. Although RNN-Transducer has many advantages including its strong accuracy …

[HTML][HTML] Decoupled structure for improved adaptability of end-to-end models

K Deng, PC Woodland - Speech Communication, 2024 - Elsevier
Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great
success by jointly learning acoustic and linguistic information, it still suffers from the effect of …

[PDF][PDF] miniStreamer: Enhancing small conformer with chunked-context masking for streaming ASR applications on the edge

H Gulzar, MR Busto, T Eda, K Itoyama, K Nakadai - Interspeech, 2023 - isca-archive.org
Real-time applications of Automatic Speech Recognition (ASR) on user devices on the edge
require streaming processing. Conformer model has achieved state-of-the-art performance …

Transformer model compression for end-to-end speech recognition on mobile devices

LB Letaifa, JL Rouas - 2022 30th European Signal Processing …, 2022 - ieeexplore.ieee.org
Transformer-based models have achieved state-of-the-art performance in various areas of
machine learning, including automatic speech recognition. However, their cost in terms of …