- Academic Search

Y Oh, P Ahn, J Kim, G Song, S Lee, IS Kweon… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot
recognition abilities yet face challenges in visio-linguistic compositionality, particularly in …

Opslaan Citeren Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding

W Li, Z Huang, X Tian, L Lu, H Li… - Proceedings of the …, 2024 - aclanthology.org

Contrastively trained vision-language models such as CLIP have achieved remarkable
progress in vision and language representation learning. Despite the promising progress …

Opslaan Citeren Verwante artikelen HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

X Zhu, P Sun, Y Song, Y **ao, Z Li, C Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate interpretation and visualization of human instructions are crucial for text-to-image
(T2I) synthesis. However, current models struggle to capture semantic variations from word …

Opslaan Citeren Verwante artikelen Alle 2 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation

Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective