When and why vision-language models behave like bags-of-words, and what to do about it?

M Yuksekgonul, F Bianchi, P Kalluri, D Jurafsky… - ar**, not attention
P Mehrani, JK Tsotsos - Frontiers in Computer Science, 2023 - frontiersin.org
Recently, a considerable number of studies in computer vision involve deep neural
architectures called vision transformers. Visual processing in these models incorporates …

Compositionality in perception: A framework

KJ Lande - Wiley Interdisciplinary Reviews: Cognitive Science, 2024 - Wiley Online Library
Perception involves the processing of content or information about the world. In what form is
this content represented? I argue that perception is widely compositional. The perceptual …

Imagine the unseen world: a benchmark for systematic generalization in visual world models

Y Kim, G Singh, J Park… - Advances in Neural …, 2024 - proceedings.neurips.cc
Systematic compositionality, or the ability to adapt to novel situations by creating a mental
model of the world using reusable pieces of knowledge, remains a significant challenge in …

Does continual learning meet compositionality? new benchmarks and an evaluation framework

W Liao, Y Wei, M Jiang, Q Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Compositionality facilitates the comprehension of novel objects using acquired concepts
and the maintenance of a knowledge pool. This is particularly crucial for continual learners …

Emergent communication for rules reasoning

Y Guo, Y Hao, R Zhang, E Zhou, Z Du… - Advances in …, 2024 - proceedings.neurips.cc
Research on emergent communication between deep-learning-based agents has received
extensive attention due to its inspiration for linguistics and artificial intelligence. However …

R-VQA: A robust visual question answering model

S Chowdhury, B Soni - Knowledge-Based Systems, 2025 - Elsevier
Abstract Visual Question Answering (VQA) involves generating answers to questions about
visual content, such as images. VQA models process an image and a question to produce …

[PDF][PDF] Benchmarking Robustness of Text-Image Composed Retrieval

S Sun, J Gu, S Gong - arxiv preprint arxiv:2311.14837, 2023 - suntongtongtong.github.io
Text-image composed retrieval aims to retrieve the target image through the composed
query, which is specified in the form of an image plus some text that describes desired …