Scene-graph vit: End-to-end open-vocabulary visual relationship detection
Visual relationship detection aims to identify objects and their relationships in images. Prior
methods approach this task by adding separate relationship modules or decoders to existing …
methods approach this task by adding separate relationship modules or decoders to existing …
Semantic Diversity-Aware Prototype-Based Learning for Unbiased Scene Graph Generation
The scene graph generation (SGG) task involves detecting objects within an image and
predicting predicates that represent the relationships between the objects. However, in SGG …
predicting predicates that represent the relationships between the objects. However, in SGG …
Enhancing scene graph generation with hierarchical relationships and commonsense knowledge
This work introduces an enhanced approach to generating scene graphs by incorporating
both a relationship hierarchy and commonsense knowledge. Specifically, we begin by …
both a relationship hierarchy and commonsense knowledge. Specifically, we begin by …
[HTML][HTML] Enabling Perspective-Aware Ai with Contextual Scene Graph Generation
This paper advances contextual image understanding within perspective-aware Ai (PAi), an
emerging paradigm in human–computer interaction that enables users to perceive and …
emerging paradigm in human–computer interaction that enables users to perceive and …
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Recently, interpreting complex charts with logical reasoning have emerged as challenges
due to the development of vision-language models. A prior state-of-the-art (SOTA) model …
due to the development of vision-language models. A prior state-of-the-art (SOTA) model …
HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing
Visual-textual inconsistency (VTI) evaluation plays a crucial role in cleansing vision-
language data. Its main challenges stem from the high variety of image captioning datasets …
language data. Its main challenges stem from the high variety of image captioning datasets …
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Multimodal LLMs have advanced vision-language tasks but still struggle with understanding
video scenes. To bridge this gap, Video Scene Graph Generation (VidSGG) has emerged to …
video scenes. To bridge this gap, Video Scene Graph Generation (VidSGG) has emerged to …
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
Video scene graph generation (VidSGG) has emerged as a transformative approach to
capturing and interpreting the intricate relationships among objects and their temporal …
capturing and interpreting the intricate relationships among objects and their temporal …
Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph
Locating objects referred to in natural language poses a significant challenge for
autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform …
autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform …
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation
of dynamic scenes by modelling objects and their evolving relationships over time. However …
of dynamic scenes by modelling objects and their evolving relationships over time. However …