Learnable feature augmentation framework for temporal action localization

Y Tang, W Wang, C Zhang, J Liu… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org
Temporal action localization (TAL) has drawn much attention in recent years, however, the
performance of previous methods is still far from satisfactory due to the lack of annotated …

Adapting short-term transformers for action detection in untrimmed videos

M Yang, H Gao, P Guo, L Wang - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
Abstract Vision Transformer (ViT) has shown high potential in video recognition owing to its
flexible design adaptable self-attention mechanisms and the efficacy of masked pre-training …

Dr2Net: Dynamic reversible dual-residual networks for memory-efficient finetuning

C Zhao, S Liu, K Mangalam, G Qian… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Large pretrained models are increasingly crucial in modern computer vision tasks. These
models are typically used in downstream tasks by end-to-end finetuning which is highly …

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

Y Wang, J Xu, Y He, Z Song, L Wang… - Advances in …, 2025‏ - proceedings.neurips.cc
Video understanding relies on accurate action detection for temporal analysis. However,
existing mainstream methods have limitations in real-world applications due to their offline …

Online episodic memory visual query localization with egocentric streaming object memory

Z Manigrasso, M Dunnhofer, A Furnari… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Episodic memory retrieval aims to enable wearable devices with the ability to recollect from
past video observations objects or events that have been observed (eg," where did I last see …

Semi‐supervised pipe video temporal defect interval localization

Z Huang, G Pan, C Kang, YZ Lv - Computer‐Aided Civil and …, 2024‏ - Wiley Online Library
In sewer pipe closed‐circuit television inspection, accurate temporal defect localization is
essential for effective pipe assessment. Industry standards typically do not require time …

Multi-scale Graph Convolutional Network for understanding human action in videos

H Wang, S Zhang, Q Tian, L Wang, B Luo… - Advanced Engineering …, 2025‏ - Elsevier
Temporal action detection aims to classify and locate human action in videos, which has
been a difficult challenge in the field of smart transportation and intelligent manufacturing. In …

A transformer-based convolutional local attention (ConvLoA) method for temporal action localization

S Artham, SH Shaikh - International Journal of Machine Learning and …, 2024‏ - Springer
In the realm of temporal localization in videos, our research introduces a novel framework
that achieves significant results in event localization in videos. We depart from conventional …

Temporal action localization with State-Sensitive Mamba and centroid sequences enhancement

P Wang, S Lu, C Dai, S Dai, B Guo - Neurocomputing, 2025‏ - Elsevier
The temporal action localization task aims to identify and localize human behaviors in
unedited videos. However, most previous studies have employed sampling processing and …

Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models

C Han, H Wang, J Kuang, L Zhang, J Gui - arxiv preprint arxiv:2501.13795, 2025‏ - arxiv.org
Existing zero-shot temporal action detection (ZSTAD) methods predominantly use fully
supervised or unsupervised strategies to recognize unseen activities. However, these …