A robust approach to multimodal deepfake detection

D Salvi, H Liu, S Mandelli, P Bestagini, W Zhou… - Journal of …, 2023 - mdpi.com
The widespread use of deep learning techniques for creating realistic synthetic media,
commonly known as deepfakes, poses a significant threat to individuals, organizations, and …

[HTML][HTML] Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

A virtual simulation-pilot agent for training of air traffic controllers

J Zuluaga-Gomez, A Prasad, I Nigmatulina, P Motlicek… - Aerospace, 2023 - mdpi.com
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic
controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) …

Development of supervised speaker diarization system based on the pyannote audio processing library

V Khoma, Y Khoma, V Brydinskyi, A Konovalov - Sensors, 2023 - mdpi.com
Diarization is an important task when work with audiodata is executed, as it provides a
solution to the problem related to the need of dividing one analyzed call recording into …

[HTML][HTML] An assessment of in-the-wild datasets for multimodal emotion recognition

A Aguilera, D Mellado, F Rojas - Sensors, 2023 - mdpi.com
Multimodal emotion recognition implies the use of different resources and techniques for
identifying and recognizing human emotions. A variety of data sources such as faces …

[HTML][HTML] Improving hybrid ctc/attention architecture for agglutinative language speech recognition

Z Ren, N Yolwas, W Slamu, R Cao, H Wang - sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …

Characterization of deep learning-based speech-enhancement techniques in online audio processing applications

C Rascon - Sensors, 2023 - mdpi.com
Deep learning-based speech-enhancement techniques have recently been an area of
growing interest, since their impressive performance can potentially benefit a wide variety of …

Attention-based fusion of ultrashort voice utterances and depth videos for multimodal person identification

A Moufidi, D Rousseau, P Rasti - Sensors, 2023 - mdpi.com
Multimodal deep learning, in the context of biometrics, encounters significant challenges
due to the dependence on long speech utterances and RGB images, which are often …

[HTML][HTML] Multimodal sentiment analysis in realistic environments based on cross-modal hierarchical fusion network

J Huang, P Lu, S Sun, F Wang - Electronics, 2023 - mdpi.com
In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of
sentiments by fusing multimodal information, thereby enhancing the understanding of real …

Preliminary technical validation of LittleBeats™: A multimodal sensing platform to capture cardiac physiology, motion, and vocalizations

B Islam, NL McElwain, J Li, MI Davila, Y Hu, K Hu… - Sensors, 2024 - mdpi.com
Across five studies, we present the preliminary technical validation of an infant-wearable
platform, LittleBeats™, that integrates electrocardiogram (ECG), inertial measurement unit …