Scene graph generation: A comprehensive survey
Deep learning techniques have led to remarkable breakthroughs in the field of generic
object detection and have spawned a lot of scene-understanding tasks in recent years …
object detection and have spawned a lot of scene-understanding tasks in recent years …
[HTML][HTML] Scene graph generation: A comprehensive survey
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …
Object-region video transformers
Recently, video transformers have shown great success in video understanding, exceeding
CNN performance; yet existing video transformer models do not explicitly model objects …
CNN performance; yet existing video transformer models do not explicitly model objects …
Interventional video relation detection
Video Visual Relation Detection (VidVRD) aims to semantically describe the dynamic
interactions across visual concepts localized in a video in the form of subject, predicate …
interactions across visual concepts localized in a video in the form of subject, predicate …
Beyond short-term snippet: Video relation detection with spatio-temporal global context
Video visual relation detection (VidVRD) aims to describe all interacting objects in a video.
Different from relationships in static images, videos contain an addition temporal channel. A …
Different from relationships in static images, videos contain an addition temporal channel. A …
Classification-then-grounding: Reformulating video scene graphs as temporal bipartite graphs
Today's VidSGG models are all proposal-based methods, ie, they first generate numerous
paired subject-object snippets as proposals, and then conduct predicate classification for …
paired subject-object snippets as proposals, and then conduct predicate classification for …
Video visual relation detection via iterative inference
The core problem of video visual relation detection (VidVRD) lies in accurately classifying
the relation triplets, which comprise of the classes of subject and object entities, and the …
the relation triplets, which comprise of the classes of subject and object entities, and the …
Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data
Action recognition models have achieved impressive results by incorporating scene-level
annotations, such as objects, their relations, 3D structure, and more. However, obtaining …
annotations, such as objects, their relations, 3D structure, and more. However, obtaining …
Video relation detection via tracklet based visual transformer
Video Visual Relation Detection (VidVRD), has received significant attention of our
community over recent years. In this paper, we apply the state-of-the-art video object tracklet …
community over recent years. In this paper, we apply the state-of-the-art video object tracklet …
Vrdformer: End-to-end video visual relation detection with transformers
Visual relation understanding plays an essential role for holistic video understanding. Most
previous works adopt a multi-stage framework for video visual relation detection (VidVRD) …
previous works adopt a multi-stage framework for video visual relation detection (VidVRD) …