Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Multimodal prompting with missing modalities for visual recognition

YL Lee, YH Tsai, WC Chiu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when
missing-modality occurs either during training or testing in real-world situations; and 2) when …

Multi-modal learning with missing modality via shared-specific feature modelling

H Wang, Y Chen, C Ma, J Avery… - Proceedings of the …, 2023 - openaccess.thecvf.com
The missing modality issue is critical but non-trivial to be solved by multi-modal models.
Current methods aiming to handle the missing modality problem in multi-modal tasks, either …

Are multimodal transformers robust to missing modality?

M Ma, J Ren, L Zhao, D Testuggine… - Proceedings of the …, 2022 - openaccess.thecvf.com
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …

Multimodal learning with graphs

Y Ektefaie, G Dasoulas, A Noori, M Farhat… - Nature Machine …, 2023 - nature.com
Artificial intelligence for graphs has achieved remarkable success in modelling complex
systems, ranging from dynamic networks in biology to interacting particle systems in physics …

Single-model and any-modality for video object tracking

Z Wu, J Zheng, X Ren, FA Vasluianu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the realm of video object tracking auxiliary modalities such as depth thermal or event data
have emerged as valuable assets to complement the RGB trackers. In practice most existing …

[HTML][HTML] Multimodal federated learning: A survey

L Che, J Wang, Y Zhou, F Ma - Sensors, 2023 - mdpi.com
Federated learning (FL), which provides a collaborative training scheme for distributed data
sources with privacy concerns, has become a burgeoning and attractive research area. Most …

Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - pmc.ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M **ang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …