Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer
Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

Univtg: Towards unified video-language temporal grounding

KQ Lin, P Zhang, J Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Video self-stitching graph network for temporal action localization

C Zhao, AK Thabet, B Ghanem - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Temporal action localization (TAL) in videos is a challenging task, especially due to the
large variation in action temporal scales. Short actions usually occupy a major proportion in …

An empirical study of end-to-end temporal action detection

X Liu, S Bai, X Bai - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Temporal action detection (TAD) is an important yet challenging task in video
understanding. It aims to simultaneously predict the semantic label and the temporal interval …

Locvtp: Video-text pre-training for temporal localization

M Cao, T Yang, J Weng, C Zhang, J Wang… - European Conference on …, 2022 - Springer
Abstract Video-Text Pre-training (VTP) aims to learn transferable representations for various
downstream tasks from large-scale web videos. To date, almost all existing VTP methods …

Zero-shot temporal action detection via vision-language prompting

S Nag, X Zhu, YZ Song, T **ang - European Conference on Computer …, 2022 - Springer
Existing temporal action detection (TAD) methods rely on large training data including
segment-level annotations, limited to recognizing previously seen classes alone during …

Cross-modal consensus network for weakly supervised temporal action localization

FT Hong, JC Feng, D Xu, Y Shan… - Proceedings of the 29th …, 2021 - dl.acm.org
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to
localize action instances in the given video with video-level categorical supervision …

Proposal-free temporal action detection via global segmentation mask learning

S Nag, X Zhu, YZ Song, T **ang - European Conference on Computer …, 2022 - Springer
Existing temporal action detection (TAD) methods rely on generating an overwhelmingly
large number of proposals per video. This leads to complex model designs due to proposal …