VGDiffZero: Text-to-image diffusion models can be zero-shot visual grounders
Large-scale text-to-image diffusion models have shown impressive capabilities for
generative tasks by leveraging strong vision-language alignment from pre-training …
generative tasks by leveraging strong vision-language alignment from pre-training …
Unified Text-to-Image Generation and Retrieval
How humans can efficiently and effectively acquire images has always been a perennial
question. A typical solution is text-to-image retrieval from an existing database given the text …
question. A typical solution is text-to-image retrieval from an existing database given the text …
: Interpreting and leveraging semantic information in diffusion models
We study $\textit {how} $ rich visual semantic information is represented within various
layers and denoising timesteps of different diffusion architectures. We uncover …
layers and denoising timesteps of different diffusion architectures. We uncover …
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text
prompts and semantic inputs of other modalities like edge maps. Nevertheless, current …
prompts and semantic inputs of other modalities like edge maps. Nevertheless, current …
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models
The design of no-reference (NR) image quality assessment (IQA) algorithms is extremely
important to benchmark and calibrate user experiences in modern visual systems. A major …
important to benchmark and calibrate user experiences in modern visual systems. A major …
[PDF][PDF] Multimodal Understanding using Stable-Diffusion as a Task Aware Feature Extractor
Multimodal large language models have shown tremendous advancements in parsing and
reasoning about complex scenes. However recent research has highlighted the weak vision …
reasoning about complex scenes. However recent research has highlighted the weak vision …