Single-branch network for multimodal training

MS Saeed, S Nawaz, MH Khan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
With the rapid growth of social media platforms, users are sharing billions of multimedia
posts containing audio, images, and text. Researchers have focused on building …

Speaker recognition in realistic scenario using multimodal data

SH Shah, MS Saeed, S Nawaz… - 2023 3rd International …, 2023 - ieeexplore.ieee.org
In recent years, an association is established between faces and voices of celebrities
leveraging large scale audio-visual information from YouTube. The availability of large scale …

Dctm: Dilated convolutional transformer model for multimodal engagement estimation in conversation

VN Tu, VT Huynh, HJ Yang, SH Kim, S Nawaz… - Proceedings of the 31st …, 2023 - dl.acm.org
Conversational engagement estimation is posed as a regression problem, entailing the
identification of the favorable attention and involvement of the participants in the …

Multimodal pre-train then transfer learning approach for speaker recognition

S Jabeen, MS Amin, X Li - Multimedia Tools and Applications, 2024 - Springer
Cognitive science has well-established the correlation between faces and voices because
neuro-cognitive pathways of both information share the same structure. Recently, the task …

[HTML][HTML] Audio–Visual Fusion Based on Interactive Attention for Person Verification

X **g, L He, Z Song, S Wang - Sensors, 2023 - mdpi.com
With the rapid development of multimedia technology, personnel verification systems have
become increasingly important in the security field and identity verification. However …