- Academic Search

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier

Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

Tallenna Viittaa Viittausten määrä 149 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Clip in medical imaging: A comprehensive survey

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

Tallenna Viittaa Viittausten määrä 52 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Long-clip: Unlocking the long-text capability of clip

B Zhang, P Zhang, X Dong, Y Zang, J Wang - European Conference on …, 2024 - Springer

Abstract Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-
shot classification, text-image retrieval, and text-image generation by aligning image and …

Tallenna Viittaa Viittausten määrä 81 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Tallenna Viittaa Viittausten määrä 640 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Blind image quality assessment via vision-language correspondence: A multitask learning perspective

W Zhang, G Zhai, Y Wei, X Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We aim at advancing blind image quality assessment (BIQA), which predicts the human
perception of image quality without any reference information. We develop a general and …

Tallenna Viittaa Viittausten määrä 182 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Motionclip: Exposing human motion generation to clip space

G Tevet, B Gordon, A Hertz, AH Bermano… - … on Computer Vision, 2022 - Springer

We introduce MotionCLIP, a 3D human motion auto-encoder featuring a latent embedding
that is disentangled, well behaved, and supports highly semantic textual descriptions …

Tallenna Viittaa Viittausten määrä 322 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models

L Lian, B Li, A Yala, T Darrell - arxiv preprint arxiv:2305.13655, 2023 - arxiv.org

Recent advancements in text-to-image diffusion models have yielded impressive results in
generating realistic and diverse images. However, these models still struggle with complex …

Tallenna Viittaa Viittausten määrä 148 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reprompt: Automatic prompt editing to refine ai-generative art towards precise expressions

Y Wang, S Shen, BY Lim - Proceedings of the 2023 CHI conference on …, 2023 - dl.acm.org

Generative AI models have shown impressive ability to produce images with text prompts,
which could benefit creativity in visual art creation and self-expression. However, it is …

Tallenna Viittaa Viittausten määrä 93 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Teaching clip to count to ten

R Paiss, A Ephrat, O Tov, S Zada… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large vision-language models, such as CLIP, learn robust representations of text and
images, facilitating advances in many downstream tasks, including zero-shot classification …

Tallenna Viittaa Viittausten määrä 75 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Clipdraw: Exploring text-to-drawing synthesis through language-image encoders

K Frans, L Soros, O Witkowski - Advances in Neural …, 2022 - proceedings.neurips.cc

CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It
does not require any additional training; rather, a pre-trained CLIP language-image encoder …

Tallenna Viittaa Viittausten määrä 196 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Clipasso: Semantically-aware object sketching

[HTML][HTML] Review of large vision models and visual prompt engineering

Clip in medical imaging: A comprehensive survey

Long-clip: Unlocking the long-text capability of clip

Multimodal learning with transformers: A survey

Blind image quality assessment via vision-language correspondence: A multitask learning perspective

Motionclip: Exposing human motion generation to clip space

Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models

Reprompt: Automatic prompt editing to refine ai-generative art towards precise expressions

Teaching clip to count to ten

Clipdraw: Exploring text-to-drawing synthesis through language-image encoders