Spoken instruction understanding in air traffic control: Challenge, technique, and application

Y Lin - Aerospace, 2021 - mdpi.com
In air traffic control (ATC), speech communication with radio transmission is the primary way
to exchange information between the controller and aircrew. A wealth of contextual …

From english to more languages: Parameter-efficient model reprogramming for cross-lingual speech recognition

CHH Yang, B Li, Y Zhang, N Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this work, we propose a new parameter-efficient learning framework based on neural
model reprogramming for cross-lingual speech recognition, which can re-purpose well …

Class LM and word map** for contextual biasing in end-to-end ASR

R Huang, O Abdel-Hamid, X Li, G Evermann - arxiv preprint arxiv …, 2020 - arxiv.org
In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid interest in the
speech recognition community. They convert speech input to text units in a single trainable …

Adversarial meta sampling for multilingual low-resource speech recognition

Y **ao, K Gong, P Zhou, G Zheng, X Liang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target
language data cannot well train an ASR model. To solve this issue, meta-learning …

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Y Lin, B Yang, L Li, D Guo, J Zhang, H Chen… - Applied Soft …, 2021 - Elsevier
In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to
tackle the issue of translating communication speech into human-readable text in air traffic …

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

F Yu, Z Yao, X Wang, K An, L **e, Z Ou… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Automatic speech recognition (ASR) has been significantly advanced with the use of deep
learning and big data. How-ever improving robustness, including achieving equally good …

SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability

X Shi, Y Yang, Z Li, Y Chen, Z Gao… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Hotword customization is one of the concerned issues remained in ASR field-it is of value to
enable users of ASR systems to customize names of entities, persons and other phrases to …

Differentiable allophone graphs for language-universal speech recognition

B Yan, S Dalmia, DR Mortensen, F Metze… - arxiv preprint arxiv …, 2021 - arxiv.org
Building language-universal speech recognition systems entails producing phonological
units of spoken sound that can be shared across languages. While speech annotations at …

Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Z Wu, G Song, C Li, P Rondon, Z Meng, X Velez… - arxiv preprint arxiv …, 2024 - arxiv.org
Contextual biasing enables speech recognizers to transcribe important phrases in the
speaker's context, such as contact names, even if they are rare in, or absent from, the …

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

M Zeineldeen, A Zeyer, W Zhou, T Ng… - arxiv preprint arxiv …, 2020 - arxiv.org
Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention
models for automatic speech recognition (ASR) use graphemes or grapheme-based …