Learning modality-agnostic representation for semantic segmentation from any modalities
Image modality is not perfect as it often fails in certain conditions, eg, night and fast motion.
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …
Centering the value of every modality: Towards efficient and resilient modality-agnostic semantic segmentation
Fusing an arbitrary number of modalities is vital for achieving robust multi-modal fusion of
semantic segmentation yet remains less explored to date. Recent endeavors regard RGB …
semantic segmentation yet remains less explored to date. Recent endeavors regard RGB …
Towards Modality Generalization: A Benchmark and Prospective Analysis
Multi-modal learning has achieved remarkable success by integrating information from
various modalities, achieving superior performance in tasks like recognition and retrieval …
various modalities, achieving superior performance in tasks like recognition and retrieval …
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …
training, and only a single one is taken for prediction at inference. However, for a real …
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning
The development of Large Language Models (LLMs) has significantly advanced various AI
applications in commercial and scientific research fields, such as scientific literature …
applications in commercial and scientific research fields, such as scientific literature …
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection
In this paper, we address the challenging modality-agnostic semantic segmentation (MaSS),
aiming at centering the value of every modality at every feature granularity. Training with all …
aiming at centering the value of every modality at every feature granularity. Training with all …
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Multi-modal large language models (MLLMs) have shown remarkable abilities in various
visual understanding tasks. However, MLLMs still struggle with fine-grained visual …
visual understanding tasks. However, MLLMs still struggle with fine-grained visual …
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Multimodal learning plays a crucial role in enabling machine learning models to fuse and
utilize diverse data sources, such as text, images, and audio, to support a variety of …
utilize diverse data sources, such as text, images, and audio, to support a variety of …