[HTML][HTML] Review of large vision models and visual prompt engineering
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …
artificial general intelligence. As the development of large vision models progresses, the …
Clip in medical imaging: A comprehensive survey
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …
paradigm, successfully introduces text supervision to vision models. It has shown promising …
Long-clip: Unlocking the long-text capability of clip
Abstract Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-
shot classification, text-image retrieval, and text-image generation by aligning image and …
shot classification, text-image retrieval, and text-image generation by aligning image and …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Blind image quality assessment via vision-language correspondence: A multitask learning perspective
We aim at advancing blind image quality assessment (BIQA), which predicts the human
perception of image quality without any reference information. We develop a general and …
perception of image quality without any reference information. We develop a general and …
Motionclip: Exposing human motion generation to clip space
We introduce MotionCLIP, a 3D human motion auto-encoder featuring a latent embedding
that is disentangled, well behaved, and supports highly semantic textual descriptions …
that is disentangled, well behaved, and supports highly semantic textual descriptions …
Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models
Recent advancements in text-to-image diffusion models have yielded impressive results in
generating realistic and diverse images. However, these models still struggle with complex …
generating realistic and diverse images. However, these models still struggle with complex …
Reprompt: Automatic prompt editing to refine ai-generative art towards precise expressions
Generative AI models have shown impressive ability to produce images with text prompts,
which could benefit creativity in visual art creation and self-expression. However, it is …
which could benefit creativity in visual art creation and self-expression. However, it is …
Teaching clip to count to ten
Large vision-language models, such as CLIP, learn robust representations of text and
images, facilitating advances in many downstream tasks, including zero-shot classification …
images, facilitating advances in many downstream tasks, including zero-shot classification …
Clipdraw: Exploring text-to-drawing synthesis through language-image encoders
CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It
does not require any additional training; rather, a pre-trained CLIP language-image encoder …
does not require any additional training; rather, a pre-trained CLIP language-image encoder …