[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Panoptic video scene graph generation

J Yang, W Peng, X Li, Z Guo, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Towards building comprehensive real-world visual perception systems, we propose and
study a new problem called panoptic scene graph generation (PVSG). PVSG is related to …

Enriching local and global contexts for temporal action localization

Z Zhu, W Tang, L Wang, N Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …

Sportshhi: A dataset for human-human interaction detection in sports videos

T Wu, R He, G Wu, L Wang - … of the IEEE/CVF conference on …, 2024 - openaccess.thecvf.com
Video-based visual relation detection tasks such as video scene graph generation play
important roles in fine-grained video understanding. However current video visual relation …

Continuous scene representations for embodied ai

SY Gadre, K Ehsani, S Song… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract We propose Continuous Scene Representations (CSR), a scene representation
constructed by an embodied agent navigating within a space, where objects and their …

Target adaptive context aggregation for video scene graph generation

Y Teng, L Wang, Z Li, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
This paper deals with a challenging task of video scene graph generation (VidSGG), which
could serve as a structured video representation for high-level understanding tasks. We …

Interventional video relation detection

Y Li, X Yang, X Shang, TS Chua - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video Visual Relation Detection (VidVRD) aims to semantically describe the dynamic
interactions across visual concepts localized in a video in the form of subject, predicate …

Few-shot human–object interaction video recognition with transformers

Q Li, X **e, J Zhang, G Shi - Neural Networks, 2023 - Elsevier
We propose a novel few-shot learning framework that can recognize human–object
interaction (HOI) classes with a few labeled samples. We achieve this by leveraging a meta …

Beyond mot: Semantic multi-object tracking

Y Li, Q Li, H Wang, X Ma, J Yao, S Dong, H Fan… - … on Computer Vision, 2024 - Springer
Current multi-object tracking (MOT) aims to predict trajectories of targets (ie,“where”) in
videos. Yet, knowing merely “where” is insufficient in many crucial applications. In …

Scene graph contrastive learning for embodied navigation

KP Singh, J Salvador, L Weihs… - Proceedings of the …, 2023 - openaccess.thecvf.com
Training effective embodied AI agents often involves expert imitation, specialized
components such as maps, or leveraging additional sensors for depth and localization …