[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Develo** real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

Improving RNN transducer modeling for end-to-end speech recognition

J Li, R Zhao, H Hu, Y Gong - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
In the last few years, an emerging trend in automatic speech recognition research is the
study of end-to-end (E2E) systems. Connectionist Temporal Classification (CTC), Attention …

On the comparison of popular end-to-end models for large scale speech recognition

J Li, Y Wu, Y Gaur, C Wang, R Zhao, S Liu - ar** RNN-T models surpassing high-performance hybrid models with customization capability
J Li, R Zhao, Z Meng, Y Liu, W Wei… - arxiv preprint arxiv …, 2020 - arxiv.org
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very
promising end-to-end (E2E) model that may replace the popular hybrid model for automatic …

Colon cancer diagnosis based on machine learning and deep learning: Modalities and analysis techniques

M Tharwat, NA Sakr, S El-Sappagh, H Soliman… - Sensors, 2022 - mdpi.com
The treatment and diagnosis of colon cancer are considered to be social and economic
challenges due to the high mortality rates. Every year, around the world, almost half a million …

Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes

B Li, Y Zhang, T Sainath, Y Wu… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for
multilingual speech recognition and synthesis. Prior work has predominantly used …

Towards code-switching ASR for end-to-end CTC models

K Li, J Li, G Ye, R Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Although great progress has been made on end-to-end (E2E) models for monolingual and
multilingual automatic speech recognition (ASR), there is no successful study for E2E …

Internal language model training for domain-adaptive end-to-end speech recognition

Z Meng, N Kanda, Y Gaur… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …