Exploring neural transducers for end-to-end speech recognition
In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and
attention-based Seq2Seq models for end-to-end speech recognition. We show that, without …
attention-based Seq2Seq models for end-to-end speech recognition. We show that, without …
Per-channel energy normalization: Why and how
In the context of automatic speech recognition and acoustic event detection, an adaptive
procedure named per-channel energy normalization (PCEN) has recently shown to …
procedure named per-channel energy normalization (PCEN) has recently shown to …
Improved training for online end-to-end speech recognition systems
Achieving high accuracy with end-to-end speech recognizers requires careful parameter
initialization prior to training. Otherwise, the networks may fail to find a good local optimum …
initialization prior to training. Otherwise, the networks may fail to find a good local optimum …
[PDF][PDF] Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.
Prior work has shown that connectionist temporal classification (CTC)-based automatic
speech recognition systems perform well when using bidirectional long short-term memory …
speech recognition systems perform well when using bidirectional long short-term memory …
Learning to detect dysarthria from raw speech
Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-
level features, by selecting the relevant information for the task at hand. We explore an …
level features, by selecting the relevant information for the task at hand. We explore an …
On front-end gain invariant modeling for wake word spotting
Wake word (WW) spotting is challenging in far-field due to the complexities and variations in
acoustic conditions and the environmental interference in signal transmission. A suite of …
acoustic conditions and the environmental interference in signal transmission. A suite of …
Improving knowledge distillation of CTC-trained acoustic models with alignment-consistent ensemble and target delay
Knowledge distillation (KD) has been widely used to improve the performance of a simpler
student model by imitating the outputs or intermediate representations of a more complex …
student model by imitating the outputs or intermediate representations of a more complex …
Acoustic domain mismatch compensation in bird audio detection
T Tang, Y Long, Y Li, J Liang - International Journal of Speech Technology, 2022 - Springer
Detecting bird calls in audio is an important task for automatic wildlife monitoring, as well as
in citizen science and audio library management. This paper presents front-end acoustic …
in citizen science and audio library management. This paper presents front-end acoustic …