Google Tudós

X Liu, S Huang, Y Kang, H Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Large-scale text-to-image diffusion models have shown impressive capabilities for
generative tasks by leveraging strong vision-language alignment from pre-training …

Mentés Hivatkozás Idézetek száma: 12 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified Text-to-Image Generation and Retrieval

L Qu, H Li, T Wang, W Wang, Y Li, L Nie… - arxiv preprint arxiv …, 2024 - arxiv.org

How humans can efficiently and effectively acquire images has always been a perennial
question. A typical solution is text-to-image retrieval from an existing database given the text …

Mentés Hivatkozás Idézetek száma: 3 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

: Interpreting and leveraging semantic information in diffusion models

D Kim, X Thomas, D Ghadiyaram - arxiv preprint arxiv:2411.16725, 2024 - arxiv.org

We study $\textit {how} $ rich visual semantic information is represented within various
layers and denoising timesteps of different diffusion architectures. We uncover …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

X He, J Zheng, JZ Fang, R Piramuthu, M Bansal… - arxiv preprint arxiv …, 2024 - arxiv.org

Controllable text-to-image (T2I) diffusion models generate images conditioned on both text
prompts and semantic inputs of other modalities like edge maps. Nevertheless, current …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models

D De, S Mitra, R Soundararajan - arxiv preprint arxiv:2406.04654, 2024 - arxiv.org

The design of no-reference (NR) image quality assessment (IQA) algorithms is extremely
important to benchmark and calibrate user experiences in modern visual systems. A major …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Multimodal Understanding using Stable-Diffusion as a Task Aware Feature Extractor

V Agarwal, G Kohavi, M Gwilliam, E Verma, D Ulbricht… - vatsalag99.github.io

Multimodal large language models have shown tremendous advancements in parsing and
reasoning about complex scenes. However recent research has highlighted the weak vision …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

VGDiffZero: Text-to-image diffusion models can be zero-shot visual grounders

Unified Text-to-Image Generation and Retrieval

: Interpreting and leveraging semantic information in diffusion models

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models

[PDF][PDF] Multimodal Understanding using Stable-Diffusion as a Task Aware Feature Extractor