[HTML][HTML] Review of large vision models and visual prompt engineering

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Q Huang, X Dong, P Zhang, B Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Hallucination posed as a pervasive challenge of multi-modal large language models
(MLLMs) has significantly impeded their real-world usage that demands precise judgment …

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in-one image restoration

Y Ai, H Huang, X Zhou, J Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Despite substantial progress all-in-one image restoration (IR) grapples with persistent
challenges in handling intricate real-world degradations. This paper introduces MPerceiver …

Parameter-efficient fine-tuning for pre-trained vision models: A survey

Y **n, S Luo, H Zhou, J Du, X Liu, Y Fan, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability
across various downstream vision tasks. However, with state-of-the-art PVMs growing to …

Dept: Decoupled prompt tuning

J Zhang, S Wu, L Gao, HT Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …

Pro-tuning: Unified prompt tuning for vision tasks

X Nie, B Ni, J Chang, G Meng, C Huo… - … on Circuits and …, 2023 - ieeexplore.ieee.org
In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision
models to perform downstream tasks. However, deploying it in practice is quite challenging …

Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers

S Yang, J Bai, K Gao, Y Yang, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Given the power of vision transformers a new learning paradigm pre-training and then
prompting makes it more efficient and effective to address downstream visual recognition …

Exploring autonomous agents through the lens of large language models: A review

S Barua - arxiv preprint arxiv:2404.04442, 2024 - arxiv.org
Large Language Models (LLMs) are transforming artificial intelligence, enabling
autonomous agents to perform diverse tasks across various domains. These agents …

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

JJ Wu, ACH Chang, CY Chuang… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper addresses text-supervised semantic segmentation aiming to learn a model
capable of segmenting arbitrary visual concepts within images by using only image-text …