[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

SLURP: A spoken language understanding resource package

E Bastianelli, A Vanzo, P Swietojanski… - arxiv preprint arxiv …, 2020 - arxiv.org
Spoken Language Understanding infers semantic meaning directly from audio data, and
thus promises to reduce error propagation and misunderstandings in end-user applications …

Deep learning in diverse intelligent sensor based systems

Y Zhu, M Wang, X Yin, J Zhang, E Meijering, J Hu - Sensors, 2022 - mdpi.com
Deep learning has become a predominant method for solving data analysis problems in
virtually all fields of science and engineering. The increasing complexity and the large …

Large-scale asr domain adaptation using self-and semi-supervised learning

D Hwang, A Misra, Z Huo, N Siddhartha… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Self-and semi-supervised learning methods have been actively investigated to reduce
labeled training data or enhance model performance. However, these approaches mostly …

Unsupervised domain adaptation for speech recognition via uncertainty driven self-training

S Khurana, N Moritz, T Hori… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
The performance of automatic speech recognition (ASR) systems typically degrades
significantly when the training and test data domains are mismatched. In this paper, we …

Modular domain adaptation for conformer-based streaming asr

Q Li, B Li, D Hwang, TN Sainath… - arxiv preprint arxiv …, 2023 - arxiv.org
Speech data from different domains has distinct acoustic and linguistic characteristics. It is
common to train a single multidomain model such as a Conformer transducer for speech …

Confidence score based speaker adaptation of conformer speech recognition systems

J Deng, X **e, T Wang, M Cui, B Xue… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Speaker adaptation techniques provide a powerful solution to customise automatic speech
recognition (ASR) systems for individual users. Practical application of unsupervised model …

On addressing practical challenges for rnn-transducer

R Zhao, J Xue, J Li, W Wei, L He… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …

Generalizing speaker verification for spoof awareness in the embedding space

X Liu, M Sahidullah, KA Lee… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
It is now well-known that automatic speaker verification (ASV) systems can be spoofed using
various types of adversaries. The usual approach to counteract ASV systems against such …