[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Neural machine translation for low-resource languages: A survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …
the early 2000s and has already entered a mature phase. While considered the most widely …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …
language processing models, we propose a unified-modal SpeechT5 framework that …
Pushing the limits of semi-supervised learning for automatic speech recognition
We employ a combination of recent developments in semi-supervised learning for automatic
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
Specaugment: A simple data augmentation method for automatic speech recognition
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
Masked language model scoring
Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …
Hierarchical neural story generation
We explore story generation: creative systems that can build coherent and fluent passages
of text about a topic. We collect a large dataset of 300K human-written stories paired with …
of text about a topic. We collect a large dataset of 300K human-written stories paired with …