A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Most recommender systems adopt collaborative filtering (CF) and provide recommendations
based on past collective interactions. Therefore, the performance of CF algorithms degrades …
based on past collective interactions. Therefore, the performance of CF algorithms degrades …
Attribute-guided cross-modal interaction and enhancement for audio-visual matching
Audio-visual matching is an essential task that measures the correlation between audio clips
and visual images. However, current methods rely solely on the joint embedding of global …
and visual images. However, current methods rely solely on the joint embedding of global …
Multi-stage Face-voice Association Learning with Keynote Speaker Diarization
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
Dctm: Dilated convolutional transformer model for multimodal engagement estimation in conversation
Conversational engagement estimation is posed as a regression problem, entailing the
identification of the favorable attention and involvement of the participants in the …
identification of the favorable attention and involvement of the participants in the …
Multimodal Representation Learning for High-Quality Recommendations in Cold-Start and Beyond-Accuracy
M Moscati - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org
Recommender systems (RS) traditionally leverage the large amount of user–item interaction
data. This exposes RS to a lower recommendation quality in cold-start scenarios, as well as …
data. This exposes RS to a lower recommendation quality in cold-start scenarios, as well as …
Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan
The advancements of technology have led to the use of multimodal systems in various real-
world applications. Among them, the audio-visual systems are one of the widely used …
world applications. Among them, the audio-visual systems are one of the widely used …
One Model to Rule Them All: A Universal Transformer for Biometric Matching
This study introduces the first single branch network designed to tackle a spectrum of
biometric matching scenarios, including unimodal, multimodal, cross-modal, and missing …
biometric matching scenarios, including unimodal, multimodal, cross-modal, and missing …
Multimodal pre-train then transfer learning approach for speaker recognition
Cognitive science has well-established the correlation between faces and voices because
neuro-cognitive pathways of both information share the same structure. Recently, the task …
neuro-cognitive pathways of both information share the same structure. Recently, the task …
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Multimodal networks have demonstrated remarkable performance improvements over their
unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion …
unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion …
Public-Private Attributes-Based Variational Adversarial Network for Audio-Visual Cross-Modal Matching
A Zheng, F Yuan, H Zhang, J Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Existing audio-visual cross-modal matching methods focus on mitigating cross-modal
heterogeneity but ignore the impact of intra-class discrepancy of the same identity in …
heterogeneity but ignore the impact of intra-class discrepancy of the same identity in …