Efficient Transfer Learning for Video-language Foundation Models

H Chen, Z Huang, Y Hong, Y Wang, Z Lyu, Z Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Pre-trained vision-language models provide a robust foundation for efficient transfer
learning across various downstream tasks. In the field of video action recognition …

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

X Lin, Y Peng, L Wang, X Zhong, M Zhu, J Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Category-level object pose estimation aims to recover the rotation, translation and size of
unseen instances within predefined categories. In this task, deep neural network-based …