Deep representation learning in speech processing: Challenges, recent advances, and future trends
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …
engineered acoustic features (feature engineering) as a separate distinct problem from the …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
Automatic speech recognition: a survey
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …
using various deep learning techniques. In this study, we present a thorough comparison …
ESPnet: End-to-end speech processing toolkit
This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …
The Kaldi speech recognition toolkit
We describe the design of Kaldi, a free, open-source toolkit for speech recognition research.
Kaldi provides a speech recognition system based on finite-state transducers (using the …
Kaldi provides a speech recognition system based on finite-state transducers (using the …
Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers
Machine-learning (ML) enables computers to learn how to recognise patterns, make
unintended decisions, or react to a dynamic environment. The effectiveness of trained …
unintended decisions, or react to a dynamic environment. The effectiveness of trained …
Moddrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on multi-scale and multi-
modal deep learning. Each visual modality captures spatial information at a particular spatial …
modal deep learning. Each visual modality captures spatial information at a particular spatial …
[PDF][PDF] Recent development of open-source speech recognition engine julius
Julius is an open-source large-vocabulary speech recognition software used for both
academic research and industrial applications. It executes real-time speech recognition of a …
academic research and industrial applications. It executes real-time speech recognition of a …
Video and audio processing in paediatrics: A review
Objective: Video and sound acquisition and processing technologies have seen great
improvements in recent decades, with many applications in the biomedical area. The aim of …
improvements in recent decades, with many applications in the biomedical area. The aim of …
SPPAS-multi-lingual approaches to the automatic annotation of speech
B Bigi - The Phonetician. Journal of the International Society of …, 2015 - hal.science
The first step of most acoustic analyses unavoidably involves the alignment of recorded
speech sounds with their phonetic annotation. This step is very labor-intensive and cost …
speech sounds with their phonetic annotation. This step is very labor-intensive and cost …