- Academic Search

Articles

Scholar

2 résultats (0,02 s)

Mon profil Ma bibliothèque

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer...

Rechercher parmi les articles qui s'y rapportent

[Free GPT-4]

[PDF] arxiv.org

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

J Zhang, Y Guo, Y Hu, X Chen, X Zhu… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advancements in Vision-Language-Action (VLA) models have leveraged pre-trained
Vision-Language Models (VLMs) to improve the generalization capabilities. VLMs, typically …

Enregistrer Citer Autres articles Version HTML

[Free GPT-4]

[PDF] acm.org

Exploring annotation-free image captioning with retrieval-augmented pseudo sentence generation

Z Li, D Liu, H Wang, C Zhang, W Cai - Proceedings of the 6th ACM …, 2024 - dl.acm.org

Recently, training an image captioner without annotated image-sentence pairs has gained
traction. Previous methods have faced limitations due to either using mismatched corpora for …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer...

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Exploring annotation-free image captioning with retrieval-augmented pseudo sentence generation