Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers

A Sundar, J Xu, W Gay… - Advances in Neural …, 2025 - proceedings.neurips.cc
An emerging area of research in situated and multimodal interactive conversations (SIMMC)
includes interactions in scientific papers. Since scientific papers are primarily composed of …

SensorQA: A Question Answering Benchmark for Daily-Life Monitoring

B Reichman, X Yu, L Hu, J Truxal, A Jain… - arxiv preprint arxiv …, 2025 - arxiv.org
With the rapid growth in sensor data, effectively interpreting and interfacing with these data
in a human-understandable way has become crucial. While existing research primarily …

Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

B Reichman, L Heck - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In many language processing tasks including most notably Large Language Modeling
(LLM), retrieval augmentation improves the performance of the models by adding …

Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models

J Tian, C Huang, Z Kira - arxiv preprint arxiv:2411.01713, 2024 - arxiv.org
Modern optimizers such as AdamW, equipped with momentum and adaptive learning rate,
are designed to escape local minima and explore the vast parameter space. This …

mForms: Multimodal Form-Filling with Question Answering

L Heck, S Heck, A Sundar - arxiv preprint arxiv:2011.12340, 2020 - arxiv.org
This paper presents a new approach to form-filling by reformulating the task as multimodal
natural language Question Answering (QA). The reformulation is achieved by first translating …

Knowledge Graphs for Multi-Modal Learning: Survey and Perspective

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - Available at SSRN … - papers.ssrn.com
Integrated with multi-modal learning, knowledge graphs (KGs) as structured knowledge
repositories, can enhance AI for processing and understanding complex, real-world data …

Directional Gradient Projection for Robust Fine-tuning of Foundation Models

C Huang, J Tian, B Maneechotesuwan… - … Conference on Learning … - openreview.net
Robust fine-tuning aims to adapt large foundation models to downstream tasks while
preserving their robustness to distribution shifts. Existing methods primarily focus on …

[ΑΝΑΦΟΡΑ][C] Multimodal Learning for Visual Question Answering using World Knowledge