[HTML][HTML] Review of large vision models and visual prompt engineering
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …
artificial general intelligence. As the development of large vision models progresses, the …
Magvit: Masked generative video transformer
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …
Styledrop: Text-to-image synthesis of any style
Pre-trained large text-to-image models synthesize impressive images with an appropriate
use of text prompts. However, ambiguities inherent in natural language, and out-of …
use of text prompts. However, ambiguities inherent in natural language, and out-of …
IRSAM: Advancing segment anything model for infrared small target detection
Abstract The recent Segment Anything Model (SAM) is a significant advancement in natural
image segmentation, exhibiting potent zero-shot performance suitable for various …
image segmentation, exhibiting potent zero-shot performance suitable for various …
Promptir: Prompting for all-in-one image restoration
Image restoration involves recovering a high-quality clean image from its degraded version.
Deep learning-based methods have significantly improved image restoration performance …
Deep learning-based methods have significantly improved image restoration performance …
Understanding and improving visual prompting: A label-map** perspective
We revisit and advance visual prompting (VP), an input prompting technique for vision tasks.
VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the …
VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the …
Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models
This study addresses the Domain-Class Incremental Learning problem, a realistic but
challenging continual learning scenario where both the domain distribution and target …
challenging continual learning scenario where both the domain distribution and target …