Scene graph generation: A comprehensive survey

G Zhu, L Zhang, Y Jiang, Y Dang, H Hou… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep learning techniques have led to remarkable breakthroughs in the field of generic
object detection and have spawned a lot of scene-understanding tasks in recent years …

[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Object-region video transformers

R Herzig, E Ben-Avraham… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, video transformers have shown great success in video understanding, exceeding
CNN performance; yet existing video transformer models do not explicitly model objects …

Interventional video relation detection

Y Li, X Yang, X Shang, TS Chua - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video Visual Relation Detection (VidVRD) aims to semantically describe the dynamic
interactions across visual concepts localized in a video in the form of subject, predicate …

Beyond short-term snippet: Video relation detection with spatio-temporal global context

C Liu, Y **, K Xu, G Gong… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video visual relation detection (VidVRD) aims to describe all interacting objects in a video.
Different from relationships in static images, videos contain an addition temporal channel. A …

Classification-then-grounding: Reformulating video scene graphs as temporal bipartite graphs

K Gao, L Chen, Y Niu, J Shao… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Today's VidSGG models are all proposal-based methods, ie, they first generate numerous
paired subject-object snippets as proposals, and then conduct predicate classification for …

Video visual relation detection via iterative inference

X Shang, Y Li, J **ao, W Ji, TS Chua - Proceedings of the 29th ACM …, 2021 - dl.acm.org
The core problem of video visual relation detection (VidVRD) lies in accurately classifying
the relation triplets, which comprise of the classes of subject and object entities, and the …

Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data

R Herzig, O Abramovich… - Proceedings of the …, 2024 - openaccess.thecvf.com
Action recognition models have achieved impressive results by incorporating scene-level
annotations, such as objects, their relations, 3D structure, and more. However, obtaining …

Video relation detection via tracklet based visual transformer

K Gao, L Chen, Y Huang, J **ao - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video Visual Relation Detection (VidVRD), has received significant attention of our
community over recent years. In this paper, we apply the state-of-the-art video object tracklet …

Vrdformer: End-to-end video visual relation detection with transformers

S Zheng, S Chen, Q ** - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Visual relation understanding plays an essential role for holistic video understanding. Most
previous works adopt a multi-stage framework for video visual relation detection (VidVRD) …