- Academic Search

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Save Cite Cited by 44 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Neural motifs: Scene graph parsing with global context

R Zellers, M Yatskar, S Thomson… - Proceedings of the …, 2018 - openaccess.thecvf.com

We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …

Save Cite Cited by 1168 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mit.edu

Visual spatial reasoning

F Liu, G Emerson, N Collier - Transactions of the Association for …, 2023 - direct.mit.edu

Spatial relations are a basic part of human cognition. However, they are expressed in
natural language in a variety of ways, and previous work has suggested that current vision …

Save Cite Cited by 185 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Visual commonsense r-cnn

T Wang, J Huang, H Zhang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …

Save Cite Cited by 324 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Modeling relationships in referential expressions with compositional modular networks

R Hu, M Rohrbach, J Andreas… - Proceedings of the …, 2017 - openaccess.thecvf.com

People often refer to entities in an image in terms of their relationships with other entities. For
example," the black cat sitting under the table" refers to both a" black cat" entity and its …

Save Cite Cited by 442 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Weakly-supervised learning of visual relations

J Peyre, J Sivic, I Laptev… - Proceedings of the ieee …, 2017 - openaccess.thecvf.com

This paper introduces a novel approach for modeling visual relations between pairs of
objects. We call relation a triplet of the form (subject, predicate, object) where the predicate …

Save Cite Cited by 240 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

PIGLeT: Language grounding through neuro-symbolic interaction in a 3D world

R Zellers, A Holtzman, M Peters, R Mottaghi… - arxiv preprint arxiv …, 2021 - arxiv.org

We propose PIGLeT: a model that learns physical commonsense knowledge through
interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a …

Save Cite Cited by 80 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Things not written in text: Exploring spatial commonsense from visual signals

X Liu, D Yin, Y Feng, D Zhao - arxiv preprint arxiv:2203.08075, 2022 - arxiv.org

Spatial commonsense, the knowledge about spatial position and relationship between
objects (like the relative size of a lion and a girl, and the position of a boy relative to a bicycle …

Save Cite Cited by 50 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Envisioning narrative intelligence: A creative visual storytelling anthology

BA Halperin, SM Lukin - Proceedings of the 2023 CHI Conference on …, 2023 - dl.acm.org

In this paper, we collect an anthology of 100 visual stories from authors who participated in
our systematic creative process of improvised story-building based on image sequences …

Save Cite Cited by 24 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Text2scene: Generating compositional scenes from textual descriptions

F Tan, S Feng, V Ordonez - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com

In this paper, we propose Text2Scene, a model that generates various forms of
compositional scene representations from natural language descriptions. Unlike recent …

Save Cite Cited by 108 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Stating the obvious: Extracting visual common sense knowledge

Knowledge graphs meet multi-modal learning: A comprehensive survey

Neural motifs: Scene graph parsing with global context

Visual spatial reasoning

Visual commonsense r-cnn

Modeling relationships in referential expressions with compositional modular networks

Weakly-supervised learning of visual relations

PIGLeT: Language grounding through neuro-symbolic interaction in a 3D world

Things not written in text: Exploring spatial commonsense from visual signals

Envisioning narrative intelligence: A creative visual storytelling anthology

Text2scene: Generating compositional scenes from textual descriptions