Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X **e, M Cui, J Yu… - … on Audio, Speech …, 2021‏ - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

Trends and developments in automatic speech recognition research

D O'Shaughnessy - Computer Speech & Language, 2024‏ - Elsevier
This paper discusses how automatic speech recognition systems are and could be
designed, in order to best exploit the discriminative information encoded in human speech …

Wake word detection with streaming transformers

Y Wang, H Lv, D Povey, L **e… - ICASSP 2021-2021 …, 2021‏ - ieeexplore.ieee.org
Modern wake word detection systems usually rely on neural networks for acoustic modeling.
Transformers has recently shown superior performance over LSTM and convolutional …

CTC variations through new WFST topologies

A Laptev, S Majumdar, B Ginsburg - arxiv preprint arxiv:2110.03098, 2021‏ - arxiv.org
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to
implement Connectionist Temporal Classification (CTC)-like algorithms for automatic …

Principled comparisons for end-to-end speech recognition: Attention vs hybrid at the 1000-hour scale

A Rouhe, T Grósz, M Kurimo - IEEE/ACM Transactions on …, 2023‏ - ieeexplore.ieee.org
End-to-End speech recognition has become the center of attention for speech recognition
research, but Hybrid Hidden Markov Model Deep Neural Network (HMM/DNN)-systems …

Finnish parliament ASR corpus: Analysis, benchmarks and statistics

A Virkkunen, A Rouhe, N Phan, M Kurimo - Language Resources and …, 2023‏ - Springer
Public sources like parliament meeting recordings and transcripts provide ever-growing
material for the training and evaluation of automatic speech recognition (ASR) systems. In …

Audio-visual multi-channel integration and recognition of overlapped speech

J Yu, SX Zhang, B Wu, S Liu, S Hu… - … on Audio, Speech …, 2021‏ - ieeexplore.ieee.org
Automatic speech recognition (ASR) technologies have been significantly advanced in the
past few decades. However, recognition of overlapped speech remains a highly challenging …

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

A Vyas, S Madikeri, H Bourlard - arxiv preprint arxiv:2104.02558, 2021‏ - arxiv.org
In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the
overfitting issues with connectionist temporal classification (CTC) training to reduce its …

Pkwrap: a pytorch package for lf-mmi training of acoustic models

S Madikeri, S Tong, J Zuluaga-Gomez, A Vyas… - arxiv preprint arxiv …, 2020‏ - arxiv.org
We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's
LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi …

Wake word detection with alignment-free lattice-free MMI

Y Wang, H Lv, D Povey, L **e, S Khudanpur - arxiv preprint arxiv …, 2020‏ - arxiv.org
Always-on spoken language interfaces, eg personal digital assistants, rely on a wake word
to start processing spoken input. We present novel methods to train a hybrid DNN/HMM …