Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A survey on evolutionary multiobjective feature selection in classification: approaches, applications, and challenges

R Jiao, BH Nguyen, B Xue… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Maximizing the classification accuracy and minimizing the number of selected features are
two primary objectives in feature selection, which is inherently a multiobjective task …

Vision-language models for medical report generation and visual question answering: A review

I Hartsock, G Rasool - Frontiers in Artificial Intelligence, 2024 - frontiersin.org
Medical vision-language models (VLMs) combine computer vision (CV) and natural
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …

Provable dynamic fusion for low-quality multimodal data

Q Zhang, H Wu, C Zhang, Q Hu, H Fu… - International …, 2023 - proceedings.mlr.press
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …

Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)

Y Huang, J Lin, C Zhou, H Yang… - … conference on machine …, 2022 - proceedings.mlr.press
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …

Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification

Z Han, F Yang, J Huang, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Integration of heterogeneous and high-dimensional data (eg, multiomics) is becoming
increasingly important. Existing multimodal classification algorithms mainly focus on …

On uni-modal feature learning in supervised multi-modal learning

C Du, J Teng, T Li, Y Liu, T Yuan… - International …, 2023 - proceedings.mlr.press
We abstract the features (ie learned representations) of multi-modal data into 1) uni-modal
features, which can be learned from uni-modal training, and 2) paired features, which can …

A survey on integrated sensing, communication, and computation

D Wen, Y Zhou, X Li, Y Shi, K Huang… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org
The forthcoming generation of wireless technology, 6G, promises a revolutionary leap
beyond traditional data-centric services. It aims to usher in an era of ubiquitous intelligent …

Divert more attention to vision-language tracking

M Guo, Z Zhang, H Fan, L **g - Advances in Neural …, 2022 - proceedings.neurips.cc
Relying on Transformer for complex visual feature learning, object tracking has witnessed
the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by …

Factorized contrastive learning: Going beyond multi-view redundancy

PP Liang, Z Deng, MQ Ma, JY Zou… - Advances in …, 2024 - proceedings.neurips.cc
In a wide range of multimodal tasks, contrastive learning has become a particularly
appealing approach since it can successfully learn representations from abundant …