A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios

C Ganhör, M Moscati, A Hausberger, S Nawaz… - Proceedings of the 18th …, 2024 - dl.acm.org
Most recommender systems adopt collaborative filtering (CF) and provide recommendations
based on past collective interactions. Therefore, the performance of CF algorithms degrades …

Attribute-guided cross-modal interaction and enhancement for audio-visual matching

J Wang, A Zheng, Y Yan, R He… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Audio-visual matching is an essential task that measures the correlation between audio clips
and visual images. However, current methods rely solely on the joint embedding of global …

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Dctm: Dilated convolutional transformer model for multimodal engagement estimation in conversation

VN Tu, VT Huynh, HJ Yang, SH Kim, S Nawaz… - Proceedings of the 31st …, 2023 - dl.acm.org
Conversational engagement estimation is posed as a regression problem, entailing the
identification of the favorable attention and involvement of the participants in the …

Multimodal Representation Learning for High-Quality Recommendations in Cold-Start and Beyond-Accuracy

M Moscati - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org
Recommender systems (RS) traditionally leverage the large amount of user–item interaction
data. This exposes RS to a lower recommendation quality in cold-start scenarios, as well as …

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

MS Saeed, S Nawaz, MS Tahir, RK Das… - arxiv preprint arxiv …, 2024 - arxiv.org
The advancements of technology have led to the use of multimodal systems in various real-
world applications. Among them, the audio-visual systems are one of the widely used …

One Model to Rule Them All: A Universal Transformer for Biometric Matching

M Abdrakhmanova, A Yermekova, Y Barko… - IEEE …, 2024 - ieeexplore.ieee.org
This study introduces the first single branch network designed to tackle a spectrum of
biometric matching scenarios, including unimodal, multimodal, cross-modal, and missing …

Multimodal pre-train then transfer learning approach for speaker recognition

S Jabeen, MS Amin, X Li - Multimedia Tools and Applications, 2024 - Springer
Cognitive science has well-established the correlation between faces and voices because
neuro-cognitive pathways of both information share the same structure. Recently, the task …

Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

MS Saeed, S Nawaz, MZ Zaheer, MH Khan… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal networks have demonstrated remarkable performance improvements over their
unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion …

Public-Private Attributes-Based Variational Adversarial Network for Audio-Visual Cross-Modal Matching

A Zheng, F Yuan, H Zhang, J Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Existing audio-visual cross-modal matching methods focus on mitigating cross-modal
heterogeneity but ignore the impact of intra-class discrepancy of the same identity in …