An efficient spatio-temporal pyramid transformer for action detection

Y Weng, Z Pan, M Han, X Chang, B Zhuang - European Conference on …, 2022 - Springer
The task of action detection aims at deducing both the action category and localization of the
start and end moment for each action instance in a long, untrimmed video. While vision …

Decomposed cross-modal distillation for rgb-based temporal action detection

P Lee, T Kim, M Shim, D Wee… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Temporal action detection aims to predict the time intervals and the classes of action
instances in the video. Despite the promising performance, existing two-stream models …

Localizing moments in long video via multimodal guidance

W Barrios, M Soldan… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent introduction of the large-scale, long-form MAD and Ego4D datasets has enabled
researchers to investigate the performance of current state-of-the-art methods for video …

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

Nsnet: Non-saliency suppression sampler for efficient video recognition

B **a, W Wu, H Wang, R Su, D He, H Yang… - … on Computer Vision, 2022 - Springer
It is challenging for artificial intelligence systems to achieve accurate video recognition
under the scenario of low computation costs. Adaptive inference based efficient video …

Temporal saliency query network for efficient video recognition

B **a, Z Wang, W Wu, H Wang, J Han - European Conference on …, 2022 - Springer
Efficient video recognition is a hot-spot research topic with the explosive growth of
multimedia data on the Internet and mobile devices. Most existing methods select the salient …

Temporal action localization in the deep learning era: A survey

B Wang, Y Zhao, L Yang, T Long… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The temporal action localization research aims to discover action instances from untrimmed
videos, representing a fundamental step in the field of intelligent video understanding. With …

Egotv: Egocentric task verification from natural language task descriptions

R Hazra, B Chen, A Rai, N Kamra… - Proceedings of the …, 2023 - openaccess.thecvf.com
To enable progress towards egocentric agents capable of understanding everyday tasks
specified in natural language, we propose a benchmark and a synthetic dataset called …

Multi-level Content-aware Boundary Detection for Temporal Action Proposal Generation

T Su, H Wang, L Wang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
It is challenging to generate temporal action proposals from untrimmed videos. In general,
boundary-based temporal action proposal generators are based on detecting temporal …

MIFNet: Multiple instances focused temporal action proposal generation

L Wang, H Yao, H Yang, S Wang - Neurocomputing, 2023 - Elsevier
Temporal action proposal generation (TAPG) serves as a promising solution for video
analysis. However, the performance of existing methods is still far from satisfactory for real …