- Academic Search

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Speichern Zitieren Zitiert von: 197 Ähnliche Artikel Alle 7 Versionen Bibliothekssuche HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Llmscore: Unveiling the power of large language models in text-to-image synthesis evaluation

Y Lu, X Yang, X Li, XE Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Existing automatic evaluation on text-to-image synthesis can only provide an image-text
matching score, without considering the object-level compositionality, which results in poor …

Speichern Zitieren Zitiert von: 69 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal procedural planning via dual text-image prompting

Y Lu, P Lu, Z Chen, W Zhu, XE Wang… - ar** in vision-language models

Z Ma, J Pan, J Chai - arxiv preprint arxiv:2306.08685, 2023 - arxiv.org

The ability to connect language units to their referents in the physical world, referred to as
grounding, is crucial to learning and understanding grounded meanings of words. While …

Speichern Zitieren Zitiert von: 12 Ähnliche Artikel Alle 6 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Imagination-augmented natural language understanding

Vision-language pre-training: Basics, recent advances, and future trends

Llmscore: Unveiling the power of large language models in text-to-image synthesis evaluation

Multimodal procedural planning via dual text-image prompting