[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Recent advances in recurrent neural networks

H Salehinejad, S Sankar, J Barfett, E Colak… - arxiv preprint arxiv …, 2017 - arxiv.org
Recurrent neural networks (RNNs) are capable of learning features and long term
dependencies from sequential and time-series data. The RNNs have a stack of non-linear …

A general survey on attention mechanisms in deep learning

G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org
Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arxiv preprint arxiv:2201.02184, 2022 - arxiv.org
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph… - arxiv preprint arxiv …, 2019 - arxiv.org
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

code2vec: Learning distributed representations of code

U Alon, M Zilberstein, O Levy, E Yahav - Proceedings of the ACM on …, 2019 - dl.acm.org
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …

Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition

L Dong, S Xu, B Xu - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org
Recurrent sequence-to-sequence models using encoder-decoder architecture have made
great progress in speech recognition task. However, they suffer from the drawback of slow …