- Academic Search

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 15 versies In cache

[HTML][HTML] Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

Opslaan Citeren Geciteerd door 27 Verwante artikelen Alle 14 versies In cache

A virtual simulation-pilot agent for training of air traffic controllers

J Zuluaga-Gomez, A Prasad, I Nigmatulina, P Motlicek… - Aerospace, 2023 - mdpi.com

In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …

Opslaan Citeren Geciteerd door 15 Verwante artikelen Alle 9 versies In cache

Development of supervised speaker diarization system based on the pyannote audio processing library

V Khoma, Y Khoma, V Brydinskyi, A Konovalov - Sensors, 2023 - mdpi.com

Diarization is an important task when work with audiodata is executed, as it provides a
solution to the problem related to the need of dividing one analyzed call recording into …

Opslaan Citeren Geciteerd door 11 Verwante artikelen Alle 7 versies In cache

[HTML][HTML] An assessment of in-the-wild datasets for multimodal emotion recognition

A Aguilera, D Mellado, F Rojas - Sensors, 2023 - mdpi.com

Multimodal emotion recognition implies the use of different resources and techniques for
identifying and recognizing human emotions. A variety of data sources such as faces …

Opslaan Citeren Geciteerd door 17 Verwante artikelen Alle 8 versies In cache

[HTML][HTML] Improving hybrid ctc/attention architecture for agglutinative language speech recognition

Z Ren, N Yolwas, W Slamu, R Cao, H Wang - sensors, 2022 - mdpi.com

Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …

Opslaan Citeren Geciteerd door 13 Verwante artikelen Alle 9 versies In cache

Characterization of deep learning-based speech-enhancement techniques in online audio processing applications

C Rascon - Sensors, 2023 - mdpi.com

Deep learning-based speech-enhancement techniques have recently been an area of
growing interest, since their impressive performance can potentially benefit a wide variety of …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 9 versies In cache

Attention-based fusion of ultrashort voice utterances and depth videos for multimodal person identification

A Moufidi, D Rousseau, P Rasti - Sensors, 2023 - mdpi.com

Multimodal deep learning, in the context of biometrics, encounters significant challenges
due to the dependence on long speech utterances and RGB images, which are often …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 3 versies In cache

[HTML][HTML] Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network

J Huang, P Lu, S Sun, F Wang - Electronics, 2023 - mdpi.com

In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of
sentiments by fusing multimodal information, thereby enhancing the understanding of real …