Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Dual memory networks: A versatile adaptation approach for vision-language models

Y Zhang, W Zhu, H Tang, Z Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the emergence of pre-trained vision-language models like CLIP how to adapt them to
various downstream classification tasks has garnered significant attention in recent …

Graphadapter: Tuning vision-language models with dual knowledge graph

X Li, D Lian, Z Lu, J Bai, Z Chen… - Advances in Neural …, 2023 - proceedings.neurips.cc
Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning
of vision-language models (VLMs) under the low-data regime, where only a few additional …

Adapting visual-language models for generalizable anomaly detection in medical images

C Huang, A Jiang, J Feng, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in large-scale visual-language pre-trained models have led to
significant progress in zero-/few-shot anomaly detection within natural image domains …

Low-rank few-shot adaptation of vision-language models

M Zanella, I Ben Ayed - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further
pushed their generalization capabilities at the expense of just a few labeled samples within …

Sg-former: Self-guided transformer with evolving token reallocation

S Ren, X Yang, S Liu, X Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision Transformer has demonstrated impressive success across various vision tasks.
However, its heavy computation cost, which grows quadratically with respect to the token …

Auxiliary tasks benefit 3d skeleton-based human motion prediction

C Xu, RT Tan, Y Tan, S Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Exploring spatial-temporal dependencies from observed motions is one of the core
challenges of human motion prediction. Previous methods mainly focus on dedicated …

A closer look at the few-shot adaptation of large vision-language models

J Silva-Rodriguez, S Hajimiri… - Proceedings of the …, 2024 - openaccess.thecvf.com
Efficient transfer learning (ETL) is receiving increasing attention to adapt large pre-trained
language-vision models on downstream tasks with a few labeled samples. While significant …

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

J Shi, C Li, T Gong, Y Zheng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Multiple instance learning (MIL)-based framework has become the mainstream for
processing the whole slide image (WSI) with giga-pixel size and hierarchical image context …

Efficient test-time adaptation of vision-language models

A Karmanov, D Guan, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Test-time adaptation with pre-trained vision-language models has attracted increasing
attention for tackling distribution shifts during the test time. Though prior studies have …