[HTML][HTML] Review of large vision models and visual prompt engineering

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

Styledrop: Text-to-image synthesis of any style

K Sohn, L Jiang, J Barber, K Lee… - Advances in …, 2023 - proceedings.neurips.cc
Pre-trained large text-to-image models synthesize impressive images with an appropriate
use of text prompts. However, ambiguities inherent in natural language, and out-of …

IRSAM: Advancing segment anything model for infrared small target detection

M Zhang, Y Wang, J Guo, Y Li, X Gao… - European Conference on …, 2024 - Springer
Abstract The recent Segment Anything Model (SAM) is a significant advancement in natural
image segmentation, exhibiting potent zero-shot performance suitable for various …

Promptir: Prompting for all-in-one image restoration

V Potlapalli, SW Zamir, SH Khan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Image restoration involves recovering a high-quality clean image from its degraded version.
Deep learning-based methods have significantly improved image restoration performance …

Styledrop: Text-to-image generation in any style

K Sohn, N Ruiz, K Lee, DC Chin, I Blok… - ar**_Perspective_CVPR_2023_paper.pdf" data-clk="hl=ja&sa=T&oi=gga&ct=gga&cd=8&d=4339835495476026380&ei=kl6wZ7LQD5-bieoPz4O4mAk" data-clk-atid="DNx6w2MxOjwJ" target="_blank">[PDF] thecvf.com

Understanding and improving visual prompting: A label-map** perspective

A Chen, Y Yao, PY Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
We revisit and advance visual prompting (VP), an input prompting technique for vision tasks.
VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the …

Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models

L Tang, Z Tian, K Li, C He, H Zhou, H Zhao, X Li… - … on Computer Vision, 2024 - Springer
This study addresses the Domain-Class Incremental Learning problem, a realistic but
challenging continual learning scenario where both the domain distribution and target …