Learning modality-agnostic representation for semantic segmentation from any modalities

X Zheng, Y Lyu, L Wang - European Conference on Computer Vision, 2024 - Springer
Image modality is not perfect as it often fails in certain conditions, eg, night and fast motion.
This significantly limits the robustness and versatility of existing multi-modal (ie, Image+ X) …

Centering the value of every modality: Towards efficient and resilient modality-agnostic semantic segmentation

X Zheng, Y Lyu, J Zhou, L Wang - European Conference on Computer …, 2024 - Springer
Fusing an arbitrary number of modalities is vital for achieving robust multi-modal fusion of
semantic segmentation yet remains less explored to date. Recent endeavors regard RGB …

Towards Modality Generalization: A Benchmark and Prospective Analysis

X Liu, X **a, Z Huang, TS Chua - arxiv preprint arxiv:2412.18277, 2024 - arxiv.org
Multi-modal learning has achieved remarkable success by integrating information from
various modalities, achieving superior performance in tasks like recognition and retrieval …

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

Y Lyu, X Zheng, D Kim, L Wang - arxiv preprint arxiv:2405.16108, 2024 - arxiv.org
Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …

Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

H Wu, X Li, X Xu, J Wu, D Zhang, Z Liu - arxiv preprint arxiv:2410.12130, 2024 - arxiv.org
The development of Large Language Models (LLMs) has significantly advanced various AI
applications in commercial and scientific research fields, such as scientific literature …

MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection

X Zheng, Y Lyu, L Jiang, J Zhou, L Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we address the challenging modality-agnostic semantic segmentation (MaSS),
aiming at centering the value of every modality at every feature granularity. Training with all …

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models

H He, G Li, Z Geng, J Xu, Y Peng - arxiv preprint arxiv:2501.15140, 2025 - arxiv.org
Multi-modal large language models (MLLMs) have shown remarkable abilities in various
visual understanding tasks. However, MLLMs still struggle with fine-grained visual …

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

M Jeong, M Namgung, ZM Kim, D Kang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal learning plays a crucial role in enabling machine learning models to fuse and
utilize diverse data sources, such as text, images, and audio, to support a variety of …