Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
A survey on evolutionary multiobjective feature selection in classification: approaches, applications, and challenges
Maximizing the classification accuracy and minimizing the number of selected features are
two primary objectives in feature selection, which is inherently a multiobjective task …
two primary objectives in feature selection, which is inherently a multiobjective task …
Vision-language models for medical report generation and visual question answering: A review
Medical vision-language models (VLMs) combine computer vision (CV) and natural
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
Provable dynamic fusion for low-quality multimodal data
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …
well-explained in theory. Recently, it has been observed that the best uni-modal network …
Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification
Integration of heterogeneous and high-dimensional data (eg, multiomics) is becoming
increasingly important. Existing multimodal classification algorithms mainly focus on …
increasingly important. Existing multimodal classification algorithms mainly focus on …
On uni-modal feature learning in supervised multi-modal learning
We abstract the features (ie learned representations) of multi-modal data into 1) uni-modal
features, which can be learned from uni-modal training, and 2) paired features, which can …
features, which can be learned from uni-modal training, and 2) paired features, which can …
A survey on integrated sensing, communication, and computation
The forthcoming generation of wireless technology, 6G, promises a revolutionary leap
beyond traditional data-centric services. It aims to usher in an era of ubiquitous intelligent …
beyond traditional data-centric services. It aims to usher in an era of ubiquitous intelligent …
Divert more attention to vision-language tracking
Relying on Transformer for complex visual feature learning, object tracking has witnessed
the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by …
the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by …
Factorized contrastive learning: Going beyond multi-view redundancy
In a wide range of multimodal tasks, contrastive learning has become a particularly
appealing approach since it can successfully learn representations from abundant …
appealing approach since it can successfully learn representations from abundant …