- Academic Search

Z Lin, D Pathak, B Li, J Li, X **a, G Neubig… - … on Computer Vision, 2024 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Lagre Referanse Sitert av 73 Beslektede artikler Alle 7 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual programming for step-by-step text-to-image generation and evaluation

J Cho, A Zala, M Bansal - Advances in Neural Information …, 2023 - proceedings.neurips.cc

As large language models have demonstrated impressive performance in many domains,
recent works have adopted language models (LMs) as controllers of visual modules for …

Lagre Referanse Sitert av 67 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-to-image generation

J Cho, Y Hu, R Garg, P Anderson, R Krishna… - arxiv preprint arxiv …, 2023 - arxiv.org

Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …

Lagre Referanse Sitert av 76 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Docci: Descriptions of connected and contrasting images

Y Onoe, S Rane, Z Berger, Y Bitton, J Cho… - … on Computer Vision, 2024 - Springer

Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T)
research. However, current datasets lack descriptions with fine-grained detail that would …

Lagre Referanse Sitert av 34 Beslektede artikler Alle 7 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoprism: A foundational visual encoder for video understanding

L Zhao, NB Gundavarapu, L Yuan, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …

Lagre Referanse Sitert av 31 Beslektede artikler Alle 10 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Evaluating and improving compositional text-to-visual generation

B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …

Lagre Referanse Sitert av 15 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Contrastive region guidance: Improving grounding in vision-language models without training

D Wan, J Cho, E Stengel-Eskin, M Bansal - European Conference on …, 2024 - Springer

Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …

Lagre Referanse Sitert av 24 Beslektede artikler Alle 6 versjoner

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - Computers in biology …, 2024 - Elsevier

With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

Lagre Referanse Sitert av 5 Beslektede artikler Alle 5 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dreammatcher: appearance matching self-attention for semantically-consistent text-to-image personalization

J Nam, H Kim, DJ Lee, S **, S Kim… - Proceedings of the …, 2024 - openaccess.thecvf.com

The objective of text-to-image (T2I) personalization is to customize a diffusion model to a
user-provided reference concept generating diverse images of the concept aligned with the …

Lagre Referanse Sitert av 23 Beslektede artikler Alle 7 versjoner HTML-versjon

FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction

H Hua, J Shi, K Kafle, S Jenni, D Zhang… - … on Computer Vision, 2024 - Springer

Recent progress in large-scale pre-training has led to the development of advanced vision-
language models (VLMs) with remarkable proficiency in comprehending and generating …

Lagre Referanse Sitert av 18 Beslektede artikler Alle 5 versjoner

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

What you see is what you read? improving text-image alignment evaluation

Evaluating text-to-visual generation with image-to-text generation

Visual programming for step-by-step text-to-image generation and evaluation

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-to-image generation

Docci: Descriptions of connected and contrasting images

Videoprism: A foundational visual encoder for video understanding

Evaluating and improving compositional text-to-visual generation

Contrastive region guidance: Improving grounding in vision-language models without training

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

Dreammatcher: appearance matching self-attention for semantically-consistent text-to-image personalization

FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction