Weakly supervised object localization and detection: A survey
As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for develo** new …
supervised object localization and detection plays an important role for develo** new …
Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Learning salient boundary feature for anchor-free temporal action localization
Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …
Typically, such a task aims at inferring both the action category and localization of the start …
End-to-end temporal action detection with transformer
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
Fine-grained temporal contrastive learning for weakly-supervised temporal action localization
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …
level action labels are available during model training. Despite the recent progress, existing …
Dual-evidential learning for weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action
instances and recognize their categories with only video-level labels. Despite great …
instances and recognize their categories with only video-level labels. Despite great …
Revisiting anchor mechanisms for temporal action localization
Most of the current action localization methods follow an anchor-based pipeline: depicting
action instances by pre-defined anchors, learning to select the anchors closest to the ground …
action instances by pre-defined anchors, learning to select the anchors closest to the ground …
Tsp: Temporally-sensitive pretraining of video encoders for localization tasks
Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …
localization methods operate atop precomputed video clip features. These features are …
Video moment retrieval with cross-modal neural architecture search
The task of video moment retrieval (VMR) is to retrieve the specific video moment from an
untrimmed video, according to a textual query. It is a challenging task that requires effective …
untrimmed video, according to a textual query. It is a challenging task that requires effective …
Self-mutual distillation learning for continuous sign language recognition
In recent years, deep learning moves video-based Continuous Sign Language Recognition
(CSLR) significantly forward. Currently, a typical network combination for CSLR includes a …
(CSLR) significantly forward. Currently, a typical network combination for CSLR includes a …