Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Heterogeneous contrastive learning for foundation models and beyond

L Zheng, B **g, Z Li, H Tong, J He - Proceedings of the 30th ACM …, 2024 - dl.acm.org
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive
self-supervised learning to model large-scale heterogeneous data. Many existing foundation …

Multi-level contrastive learning: Hierarchical alleviation of heterogeneity in multimodal sentiment analysis

C Fan, K Zhu, J Tao, G Yi, J Xue… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recently, multimodal fusion efforts have achieved remarkable success in Multimodal
Sentiment Analysis (MSA). However, most of the existing methods are based on model-level …

Interpretable diffusion via information decomposition

X Kong, O Liu, H Li, D Yogatama, GV Steeg - ar** multiple modalities to a target label.
Previous studies in this field have concentrated on capturing in isolation either the inter …

Demonstrating and reducing shortcuts in vision-language representation learning

M Bleeker, M Hendriksen, A Yates… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose
representations of images and captions. We focus on the situation when one image is …

Reconboost: Boosting can achieve modality reconcilement

C Hua, Q Xu, S Bao, Z Yang, Q Huang - arxiv preprint arxiv:2405.09321, 2024 - arxiv.org
This paper explores a novel multi-modal alternating learning paradigm pursuing a
reconciliation between the exploitation of uni-modal features and the exploration of cross …

Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

A Saporta, AM Puli, M Goldstein… - Advances in Neural …, 2025 - proceedings.neurips.cc
Contrastive learning methods, such as CLIP, leverage naturally paired data—for example,
images and their corresponding text captions—to learn general representations that transfer …