[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Video-of-thought: Step-by-step video reasoning from perception to cognition

H Fei, S Wu, W Ji, H Zhang, M Zhang, ML Lee… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing research of video understanding still struggles to achieve in-depth comprehension
and reasoning in complex videos, primarily due to the under-exploration of two key …

Reltr: Relation transformer for scene graph generation

Y Cong, MY Yang, B Rosenhahn - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Different objects in the same scene are more or less related to each other, but only a limited
number of these relationships are noteworthy. Inspired by Detection Transformer, which …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

Text to image generation with semantic-spatial aware gan

W Liao, K Hu, MY Yang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Text-to-image synthesis (T2I) aims to generate photo-realistic images which are
semantically consistent with the text descriptions. Existing methods are usually built upon …

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

Delving into sequential patches for deepfake detection

J Guan, H Zhou, Z Hong, E Ding… - Advances in …, 2022 - proceedings.neurips.cc
Recent advances in face forgery techniques produce nearly visually untraceable deepfake
videos, which could be leveraged with malicious intentions. As a result, researchers have …

Master: Market-guided stock transformer for stock price forecasting

T Li, Z Liu, Y Shen, X Wang, H Chen… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Stock price forecasting has remained an extremely challenging problem for many decades
due to the high volatility of the stock market. Recent efforts have been devoted to modeling …

Sportshhi: A dataset for human-human interaction detection in sports videos

T Wu, R He, G Wu, L Wang - … of the IEEE/CVF conference on …, 2024 - openaccess.thecvf.com
Video-based visual relation detection tasks such as video scene graph generation play
important roles in fine-grained video understanding. However current video visual relation …

Region-focused multi-view transformer-based generative adversarial network for cardiac cine MRI reconstruction

J Lyu, G Li, C Wang, C Qin, S Wang, Q Dou, J Qin - Medical Image Analysis, 2023 - Elsevier
Cardiac cine magnetic resonance imaging (MRI) reconstruction is challenging due to spatial
and temporal resolution trade-offs. Temporal correlation in cardiac cine MRI is informative …