[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Deep spoken keyword spotting: An overview
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
Deep learning for audio signal processing
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
The pytorch-kaldi speech recognition toolkit
The availability of open-source software is playing a remarkable role in the popularization of
speech recognition and deep learning. Kaldi, for instance, is nowadays an established …
speech recognition and deep learning. Kaldi, for instance, is nowadays an established …
Multichannel signal processing with deep neural networks for automatic speech recognition
Multichannel automatic speech recognition (ASR) systems commonly separate speech
enhancement, including localization, beamforming, and postfiltering, from acoustic …
enhancement, including localization, beamforming, and postfiltering, from acoustic …
Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
Speech processing for digital home assistants: Combining signal processing with deep-learning techniques
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …
home assistants with a spoken language interface have become a ubiquitous commodity …
FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing
Beamforming has been extensively investigated for multi-channel audio processing tasks.
Recently, learning-based beamforming methods, sometimes called neural beamformers …
Recently, learning-based beamforming methods, sometimes called neural beamformers …
Single channel target speaker extraction and recognition with speaker beam
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …