[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Recent advances in recurrent neural networks
Recurrent neural networks (RNNs) are capable of learning features and long term
dependencies from sequential and time-series data. The RNNs have a stack of non-linear …
dependencies from sequential and time-series data. The RNNs have a stack of non-linear …
A general survey on attention mechanisms in deep learning
G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org
Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …
models across many different domains and tasks. This survey provides an overview of the …
Learning audio-visual speech representation by masked multimodal cluster prediction
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …
strong signal for speech representation learning from the speaker's lip movements and the …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Specaugment: A simple data augmentation method for automatic speech recognition
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
Attention, please! A survey of neural attention models in deep learning
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …
code2vec: Learning distributed representations of code
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition
L Dong, S Xu, B Xu - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org
Recurrent sequence-to-sequence models using encoder-decoder architecture have made
great progress in speech recognition task. However, they suffer from the drawback of slow …
great progress in speech recognition task. However, they suffer from the drawback of slow …