Understanding cities with machine eyes: A review of deep computer vision in urban analytics
Modelling urban systems has interested planners and modellers for decades. Different
models have been achieved relying on mathematics, cellular automation, complexity, and …
models have been achieved relying on mathematics, cellular automation, complexity, and …
A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
TN-ZSTAD: Transferable network for zero-shot temporal activity detection
An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …
means to simultaneously recognize and localize activities in long untrimmed videos …
Learning salient boundary feature for anchor-free temporal action localization
Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …
Typically, such a task aims at inferring both the action category and localization of the start …
End-to-end temporal action detection with transformer
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
G-tad: Sub-graph localization for temporal action detection
Temporal action detection is a fundamental yet challenging task in video understanding.
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
M2tr: Multi-modal multi-scale transformers for deepfake detection
The widespread dissemination of Deepfakes demands effective approaches that can detect
perceptually convincing forged images. In this paper, we aim to capture the subtle …
perceptually convincing forged images. In this paper, we aim to capture the subtle …
Graph convolutional networks for temporal action localization
Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …
without explicitly exploiting their relations during learning. However, the relations between …
Asformer: Transformer for action segmentation
Algorithms for the action segmentation task typically use temporal models to predict what
action is occurring at each frame for a minute-long daily activity. Recent studies have shown …
action is occurring at each frame for a minute-long daily activity. Recent studies have shown …