Zero-shot video grounding with pseudo query lookup and verification
Video grounding, the process of identifying a specific moment in an untrimmed video based
on a natural language query, has become a popular topic in video understanding. However …
on a natural language query, has become a popular topic in video understanding. However …
Relative-position embedding based spatially and temporally decoupled Transformer for action recognition
Recognition of human actions is to classify actions in a video. Recently, Vision Transformer
(ViT) has been applied to action recognition. However, the Vision Transformer is unsuitable …
(ViT) has been applied to action recognition. However, the Vision Transformer is unsuitable …
Multi-frame super-resolution of remote sensing images using attention-based GAN models
Multi-frame super-resolution (MFSR) of remote sensing (RS) imageries becomes a critical
research topic with the launch of new satellites having video capturing capability and the …
research topic with the launch of new satellites having video capturing capability and the …
Deep multimodal representation learning for generalizable person re-identification
Person re-identification plays a significant role in realistic scenarios due to its various
applications in public security and video surveillance. Recently, leveraging the supervised …
applications in public security and video surveillance. Recently, leveraging the supervised …