Scene-graph vit: End-to-end open-vocabulary visual relationship detection

T Salzmann, M Ryll, A Bewley, M Minderer - European Conference on …, 2024 - Springer
Visual relationship detection aims to identify objects and their relationships in images. Prior
methods approach this task by adding separate relationship modules or decoders to existing …

Semantic Diversity-Aware Prototype-Based Learning for Unbiased Scene Graph Generation

J Jeon, K Kim, K Yoon, C Park - European Conference on Computer …, 2024 - Springer
The scene graph generation (SGG) task involves detecting objects within an image and
predicting predicates that represent the relationships between the objects. However, in SGG …

Enhancing scene graph generation with hierarchical relationships and commonsense knowledge

B Jiang, Z Zhuang, SS Shivakumar… - arxiv preprint arxiv …, 2023 - arxiv.org
This work introduces an enhanced approach to generating scene graphs by incorporating
both a relationship hierarchy and commonsense knowledge. Specifically, we begin by …

[HTML][HTML] Enabling Perspective-Aware Ai with Contextual Scene Graph Generation

D Platnick, M Alirezaie, H Rahnama - Information, 2024 - mdpi.com
This paper advances contextual image understanding within perspective-aware Ai (PAi), an
emerging paradigm in human–computer interaction that enables users to perceive and …

SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials

W Kim, S Park, Y In, S Han, C Park - arxiv preprint arxiv:2405.00021, 2024 - arxiv.org
Recently, interpreting complex charts with logical reasoning have emerged as challenges
due to the development of vision-language models. A prior state-of-the-art (SOTA) model …

HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing

Z Zhu, H Zhang, G Wu, S Lyu, B Wu - arxiv preprint arxiv:2412.05685, 2024 - arxiv.org
Visual-textual inconsistency (VTI) evaluation plays a crucial role in cleansing vision-
language data. Its main challenges stem from the high variety of image captioning datasets …

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation

TT Nguyen, P Nguyen, J Cothren, A Yilmaz… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal LLMs have advanced vision-language tasks but still struggle with understanding
video scenes. To bridge this gap, Video Scene Graph Generation (VidSGG) has emerged to …

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

TT Nguyen, P Nguyen, X Li, J Cothren, A Yilmaz… - arxiv preprint arxiv …, 2024 - arxiv.org
Video scene graph generation (VidSGG) has emerged as a transformative approach to
capturing and interpreting the intricate relationships among objects and their temporal …

Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph

S Linok, T Zemskova, S Ladanova, R Titkov… - arxiv preprint arxiv …, 2024 - arxiv.org
Locating objects referred to in natural language poses a significant challenge for
autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform …

Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation

R Peddi, AA Shrivastava, P Singla, V Gogate - arxiv preprint arxiv …, 2024 - arxiv.org
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation
of dynamic scenes by modelling objects and their evolving relationships over time. However …