Exploring neural transducers for end-to-end speech recognition

E Battenberg, J Chen, R Child, A Coates… - 2017 IEEE automatic …, 2017 - ieeexplore.ieee.org
In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and
attention-based Seq2Seq models for end-to-end speech recognition. We show that, without …

Per-channel energy normalization: Why and how

V Lostanlen, J Salamon, M Cartwright… - IEEE Signal …, 2018 - ieeexplore.ieee.org
In the context of automatic speech recognition and acoustic event detection, an adaptive
procedure named per-channel energy normalization (PCEN) has recently shown to …

Improved training for online end-to-end speech recognition systems

S Kim, ML Seltzer, J Li, R Zhao - arxiv preprint arxiv:1711.02212, 2017 - arxiv.org
Achieving high accuracy with end-to-end speech recognizers requires careful parameter
initialization prior to training. Otherwise, the networks may fail to find a good local optimum …

[PDF][PDF] Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

K Audhkhasi, G Saon, Z Tüske, B Kingsbury… - Interspeech, 2019 - academia.edu
Prior work has shown that connectionist temporal classification (CTC)-based automatic
speech recognition systems perform well when using bidirectional long short-term memory …

Learning to detect dysarthria from raw speech

J Millet, N Zeghidour - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-
level features, by selecting the relevant information for the task at hand. We explore an …

On front-end gain invariant modeling for wake word spotting

Y Gao, ND Stein, CC Kao, Y Cai, M Sun… - arxiv preprint arxiv …, 2020 - arxiv.org
Wake word (WW) spotting is challenging in far-field due to the complexities and variations in
acoustic conditions and the environmental interference in signal transmission. A suite of …

Improving knowledge distillation of CTC-trained acoustic models with alignment-consistent ensemble and target delay

H Ding, K Chen, Q Huo - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Knowledge distillation (KD) has been widely used to improve the performance of a simpler
student model by imitating the outputs or intermediate representations of a more complex …

Acoustic domain mismatch compensation in bird audio detection

T Tang, Y Long, Y Li, J Liang - International Journal of Speech Technology, 2022 - Springer
Detecting bird calls in audio is an important task for automatic wildlife monitoring, as well as
in citizen science and audio library management. This paper presents front-end acoustic …