Rethinking the heatmap regression for bottom-up human pose estimation
Heatmap regression has become the most prevalent choice for nowadays human pose
estimation methods. The ground-truth heatmaps are usually constructed by covering all …
estimation methods. The ground-truth heatmaps are usually constructed by covering all …
Enriching local and global contexts for temporal action localization
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …
Uncertainty-aware Action Decoupling Transformer for Action Anticipation
Human action anticipation aims at predicting what people will do in the future based on past
observations. In this paper we introduce Uncertainty-aware Action Decoupling Transformer …
observations. In this paper we introduce Uncertainty-aware Action Decoupling Transformer …
Learning grounded vision-language representation for versatile understanding in untrimmed videos
Joint video-language learning has received increasing attention in recent years. However,
existing works mainly focus on single or multiple trimmed video clips (events), which makes …
existing works mainly focus on single or multiple trimmed video clips (events), which makes …
Astra: An action spotting transformer for soccer videos
In this paper, we introduce ASTRA, a Transformer-based model designed for the task of
Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the …
Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the …
ContextLoc++: A unified context model for temporal action localization
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …
Multi-dimensional attention with similarity constraint for weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WTAL) is a challenging task in
understanding untrimmed videos, in which no frame-wise annotation is provided during …
understanding untrimmed videos, in which no frame-wise annotation is provided during …
TVNet: Temporal voting network for action localization
We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos.
This incorporates a novel Voting Evidence Module to locate temporal boundaries, more …
This incorporates a novel Voting Evidence Module to locate temporal boundaries, more …
Class‐wise boundary regression by uncertainty in temporal action detection
Y Chen, M Chen, Q Gu - IET Image Processing, 2022 - Wiley Online Library
Temporal action detection is a crucial aspect of video understanding. It aims to classify the
action as well as locate the start and end boundaries of the action in the untrimmed videos …
action as well as locate the start and end boundaries of the action in the untrimmed videos …
Distribution-aware Activity Boundary Representation for Online Detection of Action Start in Untrimmed Videos
X Hu, S Wang, M Li, Y Li, S Du - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
The Online Detection of Action Start (ODAS) has attracted the attention of researchers
because of its practical applications in areas such as security and emergency response …
because of its practical applications in areas such as security and emergency response …