Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Context-aware transformer transducer for speech recognition

FJ Chang, J Liu, M Radfar, A Mouchtaris… - 2021 IEEE automatic …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty
recognizing uncommon words, that appear infrequently in the training data. One promising …

A virtual simulation-pilot agent for training of air traffic controllers

J Zuluaga-Gomez, A Prasad, I Nigmatulina, P Motlicek… - Aerospace, 2023 - mdpi.com
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …

[HTML][HTML] Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

Audio caption: Listen and tell

M Wu, H Dinkel, K Yu - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Increasing amount of research has shed light on machine perception of audio events, most
of which concerns detection and classification tasks. However, human-like perception of …

[PDF][PDF] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.

Z Chen, M Jain, Y Wang, ML Seltzer, C Fuegen - Interspeech, 2019 - isca-archive.org
End-to-end approaches to automatic speech recognition, such as Listen-Attend-Spell (LAS),
blend all components of a traditional speech recognizer into a unified model. Although this …

Class LM and word map** for contextual biasing in end-to-end ASR

R Huang, O Abdel-Hamid, X Li, G Evermann - arxiv preprint arxiv …, 2020 - arxiv.org
In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid interest in the
speech recognition community. They convert speech input to text units in a single trainable …

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

Can contextual biasing remain effective with Whisper and GPT-2?

G Sun, X Zheng, C Zhang, PC Woodland - arxiv preprint arxiv:2306.01942, 2023 - arxiv.org
End-to-end automatic speech recognition (ASR) and large language models, such as
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …

[PDF][PDF] Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

Z Chen, A Rosenberg, Y Zhang, G Wang… - …, 2020 - interspeech2020.org
Text-to-Speech synthesis (TTS) based data augmentation is a relatively new mechanism for
utilizing text-only data to improve automatic speech recognition (ASR) training without …