Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

Transition is a process: Pair-to-video change detection networks for very high resolution remote sensing images

M Lin, G Yang, H Zhang - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
As an important yet challenging task in Earth observation, change detection (CD) is
undergoing a technological revolution, given the broadening application of deep learning …

Rspnet: Relative speed perception for unsupervised video representation learning

P Chen, D Huang, D He, X Long, R Zeng… - Proceedings of the …, 2021 - ojs.aaai.org
We study unsupervised video representation learning that seeks to learn both motion and
appearance features from unlabeled video only, which can be reused for downstream tasks …

Transferring vision-language models for visual recognition: A classifier perspective

W Wu, Z Sun, Y Song, J Wang, W Ouyang - International Journal of …, 2024 - Springer
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …

Adversarial feature augmentation for cross-domain few-shot classification

Y Hu, AJ Ma - European conference on computer vision, 2022 - Springer
Few-shot classification is a promising approach to solving the problem of classifying novel
classes with only limited annotated data for training. Existing methods based on meta …

Mgsampler: An explainable sampling strategy for video action recognition

Y Zhi, Z Tong, L Wang, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Frame sampling is a fundamental problem in video action recognition due to the essential
redundancy in time and limited computation resources. The existing sampling strategy often …

An efficient motion visual learning method for video action recognition

B Wang, F Chang, C Liu, W Wang, R Ma - Expert Systems with Applications, 2024 - Elsevier
Currently, efficient spatio-temporal information modeling is one of the key research
components to solve the action recognition problem. Previous approaches focus on …

Ascnet: Self-supervised video representation learning with appearance-speed consistency

D Huang, W Wu, W Hu, X Liu, D He… - Proceedings of the …, 2021 - openaccess.thecvf.com
We study self-supervised video representation learning, which is a challenging task due to
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …

AGPN: Action granularity pyramid network for video action recognition

Y Chen, H Ge, Y Liu, X Cai… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video action recognition is a fundamental task for video understanding. Action recognition in
complex spatio-temporal contexts generally requires fusing of different multi-granularity …