- Academic Search

M Yarom, Y Bitton, S Changpinyo… - Advances in …, 2023‏ - proceedings.neurips.cc‏

Automatically determining whether a text and a corresponding image are semantically
aligned is a significant challenge for vision-language models, with applications in generative …‏

שמור צטט צוטט על ידי 67 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Linguistic binding in diffusion models: Enhancing attribute correspondence through attention map alignment‏

R Rassin, E Hirsch, D Glickman… - Advances in …, 2023‏ - proceedings.neurips.cc‏

Text-conditioned image generation models often generate incorrect associations between
entities and their visual attributes. This reflects an impaired map** between linguistic …‏

שמור צטט צוטט על ידי 77 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Teaching clip to count to ten‏

R Paiss, A Ephrat, O Tov, S Zada… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Large vision-language models, such as CLIP, learn robust representations of text and
images, facilitating advances in many downstream tasks, including zero-shot classification …‏

שמור צטט צוטט על ידי 80 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task‏

M Okawa, ES Lubana, R Dick… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

Modern generative models exhibit unprecedented capabilities to generate extremely
realistic data. However, given the inherent compositionality of the real world, reliable use of …‏

שמור צטט צוטט על ידי 45 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Emergence of hidden capabilities: Exploring learning dynamics in concept space‏

CF Park, M Okawa, A Lee… - Advances in Neural …, 2025‏ - proceedings.neurips.cc‏

Modern generative models demonstrate impressive capabilities, likely stemming from an
ability to identify and manipulate abstract concepts underlying their training data. However …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [HTML] sciencedirect.com

[HTML][HTML] DALL· E 2 fails to reliably capture common syntactic processes‏

E Leivada, E Murphy, G Marcus - Social Sciences & Humanities Open, 2023‏ - Elsevier‏

Abstract Machine intelligence is increasingly being linked to claims about sentience,
language processing, and an ability to comprehend and transform natural language into a …‏

שמור צטט צוטט על ידי 53 מאמרים בנושא זה כל 4 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] unimib.it

SemEval-2023 task 1: Visual word sense disambiguation‏

A Raganato, I Calixto, A Ushio… - … 2023-Proceedings of …, 2023‏ - boa.unimib.it‏

This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task. The
objective of Visual-WSD is to identify among a set of ten images the one that corresponds to …‏

שמור צטט צוטט על ידי 39 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] acm.org

Auditing gender presentation differences in text-to-image models‏

Y Zhang, L Jiang, G Turk, D Yang - … of the 4th ACM Conference on Equity …, 2024‏ - dl.acm.org‏

Text-to-image models, which can generate high-quality images based on textual input, have
recently enabled various content-creation tools. Despite significantly affecting a wide range …‏

שמור צטט צוטט על ידי 23 מאמרים בנושא זה כל 5 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Global-local image perceptual score (glips): Evaluating photorealistic quality of ai-generated images‏

M Aziz, U Rehman, MU Danish… - IEEE Transactions on …, 2025‏ - ieeexplore.ieee.org‏

This article introduces the global-local image perceptual score (GLIPS), an image metric
designed to assess the photorealistic image quality of AI-generated images with a high …‏

שמור צטט צוטט על ידי 9 מאמרים בנושא זה כל 3 הגרסאות

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Object-conditioned energy-based attention map alignment in text-to-image diffusion models‏

Y Zhang, P Yu, YN Wu - European Conference on Computer Vision, 2024‏ - Springer‏

Text-to-image diffusion models have shown great success in generating high-quality text-
guided images. Yet, these models may still fail to semantically align generated images with …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 7 הגרסאות

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

DALLE-2 is seeing double: Flaws in word-to-concept map** in Text2Image models

What you see is what you read? improving text-image alignment evaluation‏

Linguistic binding in diffusion models: Enhancing attribute correspondence through attention map alignment‏

Teaching clip to count to ten‏

Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task‏

Emergence of hidden capabilities: Exploring learning dynamics in concept space‏

[HTML][HTML] DALL· E 2 fails to reliably capture common syntactic processes‏

SemEval-2023 task 1: Visual word sense disambiguation‏

Auditing gender presentation differences in text-to-image models‏

Global-local image perceptual score (glips): Evaluating photorealistic quality of ai-generated images‏

Object-conditioned energy-based attention map alignment in text-to-image diffusion models‏