Zero-shot video grounding with pseudo query lookup and verification
Video grounding, the process of identifying a specific moment in an untrimmed video based
on a natural language query, has become a popular topic in video understanding. However …
on a natural language query, has become a popular topic in video understanding. However …
Relative-position embedding based spatially and temporally decoupled Transformer for action recognition
Recognition of human actions is to classify actions in a video. Recently, Vision Transformer
(ViT) has been applied to action recognition. However, the Vision Transformer is unsuitable …
(ViT) has been applied to action recognition. However, the Vision Transformer is unsuitable …
Multi-frame super-resolution of remote sensing images using attention-based GAN models
Multi-frame super-resolution (MFSR) of remote sensing (RS) imageries becomes a critical
research topic with the launch of new satellites having video capturing capability and the …
research topic with the launch of new satellites having video capturing capability and the …
Action sensitivity learning for temporal action localization
Temporal action localization (TAL), which involves recognizing and locating action
instances, is a challenging task in video understanding. Most existing approaches directly …
instances, is a challenging task in video understanding. Most existing approaches directly …