Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arxiv preprint arxiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Automatic speech recognition: a survey

M Malik, MK Malik, K Mehmood… - Multimedia Tools and …, 2021 - Springer
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …

ESPnet: End-to-end speech processing toolkit

S Watanabe, T Hori, S Karita, T Hayashi… - arxiv preprint arxiv …, 2018 - arxiv.org
This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …

The Kaldi speech recognition toolkit

D Povey, A Ghoshal, G Boulianne… - IEEE 2011 workshop …, 2011 - infoscience.epfl.ch
We describe the design of Kaldi, a free, open-source toolkit for speech recognition research.
Kaldi provides a speech recognition system based on finite-state transducers (using the …

Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers

G Ateniese, LV Mancini, A Spognardi… - … Journal of Security …, 2015 - inderscienceonline.com
Machine-learning (ML) enables computers to learn how to recognise patterns, make
unintended decisions, or react to a dynamic environment. The effectiveness of trained …

Moddrop: adaptive multi-modal gesture recognition

N Neverova, C Wolf, G Taylor… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
We present a method for gesture detection and localisation based on multi-scale and multi-
modal deep learning. Each visual modality captures spatial information at a particular spatial …

[PDF][PDF] Recent development of open-source speech recognition engine julius

A Lee, T Kawahara - Asia-Pacific Signal and Information …, 2009 - academia.edu
Julius is an open-source large-vocabulary speech recognition software used for both
academic research and industrial applications. It executes real-time speech recognition of a …

Video and audio processing in paediatrics: A review

S Cabon, F Porée, A Simon, O Rosec… - Physiological …, 2019 - iopscience.iop.org
Objective: Video and sound acquisition and processing technologies have seen great
improvements in recent decades, with many applications in the biomedical area. The aim of …

SPPAS-multi-lingual approaches to the automatic annotation of speech

B Bigi - The Phonetician. Journal of the International Society of …, 2015 - hal.science
The first step of most acoustic analyses unavoidably involves the alignment of recorded
speech sounds with their phonetic annotation. This step is very labor-intensive and cost …