[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Torchaudio: Building blocks for audio and speech processing
This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …
applications in the audio and speech processing domain. The objective of TorchAudio is to …
Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition
The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …
learn good speech representations from a large amount of unlabeled speech for the …
[PDF][PDF] End-to-end arabic speech recognition: A review
Automatic speech recognition (ASR) is a crucial field of science due to its massive
applications that can be developed to help humans to improve their daily life tasks. Despite …
applications that can be developed to help humans to improve their daily life tasks. Despite …
The 2020 espnet update: new features, broadened applications, performance improvements, and future plans
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
Arabic speech recognition using end‐to‐end deep learning
Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be
integrated with other systems better than Arabic ASR methods without diacritics. In this work …
integrated with other systems better than Arabic ASR methods without diacritics. In this work …
Efficient sequence transduction by jointly predicting tokens and durations
This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for
sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by …
sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by …
Wake word detection with streaming transformers
Modern wake word detection systems usually rely on neural networks for acoustic modeling.
Transformers has recently shown superior performance over LSTM and convolutional …
Transformers has recently shown superior performance over LSTM and convolutional …