Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
Revisiting classifier: Transferring vision-language models for video recognition
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …
an important topic in computer vision research. Along with the growth of computational …
Transition is a process: Pair-to-video change detection networks for very high resolution remote sensing images
As an important yet challenging task in Earth observation, change detection (CD) is
undergoing a technological revolution, given the broadening application of deep learning …
undergoing a technological revolution, given the broadening application of deep learning …
Rspnet: Relative speed perception for unsupervised video representation learning
We study unsupervised video representation learning that seeks to learn both motion and
appearance features from unlabeled video only, which can be reused for downstream tasks …
appearance features from unlabeled video only, which can be reused for downstream tasks …
Transferring vision-language models for visual recognition: A classifier perspective
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
Adversarial feature augmentation for cross-domain few-shot classification
Y Hu, AJ Ma - European conference on computer vision, 2022 - Springer
Few-shot classification is a promising approach to solving the problem of classifying novel
classes with only limited annotated data for training. Existing methods based on meta …
classes with only limited annotated data for training. Existing methods based on meta …
Mgsampler: An explainable sampling strategy for video action recognition
Frame sampling is a fundamental problem in video action recognition due to the essential
redundancy in time and limited computation resources. The existing sampling strategy often …
redundancy in time and limited computation resources. The existing sampling strategy often …
An efficient motion visual learning method for video action recognition
Currently, efficient spatio-temporal information modeling is one of the key research
components to solve the action recognition problem. Previous approaches focus on …
components to solve the action recognition problem. Previous approaches focus on …
Ascnet: Self-supervised video representation learning with appearance-speed consistency
We study self-supervised video representation learning, which is a challenging task due to
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …
AGPN: Action granularity pyramid network for video action recognition
Video action recognition is a fundamental task for video understanding. Action recognition in
complex spatio-temporal contexts generally requires fusing of different multi-granularity …
complex spatio-temporal contexts generally requires fusing of different multi-granularity …