Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Actionformer: Localizing moments of actions with transformers
Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …
classification and object detection, and more recently for video understanding. Inspired by …
Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
Bmn: Boundary-matching network for temporal action proposal generation
Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …
locate temporal regions in real-world videos where action or event may occur. Current …
G-tad: Sub-graph localization for temporal action detection
Temporal action detection is a fundamental yet challenging task in video understanding.
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Graph convolutional networks for temporal action localization
Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …
without explicitly exploiting their relations during learning. However, the relations between …
Unloc: A unified framework for video localization tasks
While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …
Relaxed transformer decoders for direct action proposal generation
Temporal action proposal generation is an important and challenging task in video
understanding, which aims at detecting all temporal segments containing action instances of …
understanding, which aims at detecting all temporal segments containing action instances of …
Bsn: Boundary sensitive network for temporal action proposal generation
Temporal action proposal generation is an important yet challenging problem, since
temporal proposals with rich action content are indispensable for analysing real-world …
temporal proposals with rich action content are indispensable for analysing real-world …
Rethinking the faster r-cnn architecture for temporal action localization
We propose TAL-Net, an improved approach to temporal action localization in video that is
inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key …
inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key …