An efficient spatio-temporal pyramid transformer for action detection
The task of action detection aims at deducing both the action category and localization of the
start and end moment for each action instance in a long, untrimmed video. While vision …
start and end moment for each action instance in a long, untrimmed video. While vision …
Decomposed cross-modal distillation for rgb-based temporal action detection
Temporal action detection aims to predict the time intervals and the classes of action
instances in the video. Despite the promising performance, existing two-stream models …
instances in the video. Despite the promising performance, existing two-stream models …
Localizing moments in long video via multimodal guidance
The recent introduction of the large-scale, long-form MAD and Ego4D datasets has enabled
researchers to investigate the performance of current state-of-the-art methods for video …
researchers to investigate the performance of current state-of-the-art methods for video …
Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …
instances with only category labels. Most methods widely adopt the off-the-shelf …
Nsnet: Non-saliency suppression sampler for efficient video recognition
It is challenging for artificial intelligence systems to achieve accurate video recognition
under the scenario of low computation costs. Adaptive inference based efficient video …
under the scenario of low computation costs. Adaptive inference based efficient video …
Temporal saliency query network for efficient video recognition
Efficient video recognition is a hot-spot research topic with the explosive growth of
multimedia data on the Internet and mobile devices. Most existing methods select the salient …
multimedia data on the Internet and mobile devices. Most existing methods select the salient …
Temporal action localization in the deep learning era: A survey
The temporal action localization research aims to discover action instances from untrimmed
videos, representing a fundamental step in the field of intelligent video understanding. With …
videos, representing a fundamental step in the field of intelligent video understanding. With …
Egotv: Egocentric task verification from natural language task descriptions
To enable progress towards egocentric agents capable of understanding everyday tasks
specified in natural language, we propose a benchmark and a synthetic dataset called …
specified in natural language, we propose a benchmark and a synthetic dataset called …
Multi-level Content-aware Boundary Detection for Temporal Action Proposal Generation
T Su, H Wang, L Wang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
It is challenging to generate temporal action proposals from untrimmed videos. In general,
boundary-based temporal action proposal generators are based on detecting temporal …
boundary-based temporal action proposal generators are based on detecting temporal …
MIFNet: Multiple instances focused temporal action proposal generation
Temporal action proposal generation (TAPG) serves as a promising solution for video
analysis. However, the performance of existing methods is still far from satisfactory for real …
analysis. However, the performance of existing methods is still far from satisfactory for real …