When and why vision-language models behave like bags-of-words, and what to do about it?
M Yuksekgonul, F Bianchi, P Kalluri, D Jurafsky… - ar**, not attention
Recently, a considerable number of studies in computer vision involve deep neural
architectures called vision transformers. Visual processing in these models incorporates …
architectures called vision transformers. Visual processing in these models incorporates …
Compositionality in perception: A framework
KJ Lande - Wiley Interdisciplinary Reviews: Cognitive Science, 2024 - Wiley Online Library
Perception involves the processing of content or information about the world. In what form is
this content represented? I argue that perception is widely compositional. The perceptual …
this content represented? I argue that perception is widely compositional. The perceptual …
Imagine the unseen world: a benchmark for systematic generalization in visual world models
Systematic compositionality, or the ability to adapt to novel situations by creating a mental
model of the world using reusable pieces of knowledge, remains a significant challenge in …
model of the world using reusable pieces of knowledge, remains a significant challenge in …
Does continual learning meet compositionality? new benchmarks and an evaluation framework
Compositionality facilitates the comprehension of novel objects using acquired concepts
and the maintenance of a knowledge pool. This is particularly crucial for continual learners …
and the maintenance of a knowledge pool. This is particularly crucial for continual learners …
Emergent communication for rules reasoning
Research on emergent communication between deep-learning-based agents has received
extensive attention due to its inspiration for linguistics and artificial intelligence. However …
extensive attention due to its inspiration for linguistics and artificial intelligence. However …
R-VQA: A robust visual question answering model
Abstract Visual Question Answering (VQA) involves generating answers to questions about
visual content, such as images. VQA models process an image and a question to produce …
visual content, such as images. VQA models process an image and a question to produce …
[PDF][PDF] Benchmarking Robustness of Text-Image Composed Retrieval
Text-image composed retrieval aims to retrieve the target image through the composed
query, which is specified in the form of an image plus some text that describes desired …
query, which is specified in the form of an image plus some text that describes desired …