- Academic Search

文章

学术搜索

获得 3 条结果（用时0.02秒）

我的个人学术档案我的图书馆

LocCa: Visual Pretraining with Location-aware Captioners

在引用文章中搜索

[Free GPT-4]

[PDF] arxiv.org

Paligemma: A versatile 3b vlm for transfer

L Beyer, A Steiner, AS Pinto, A Kolesnikov… - arxiv preprint arxiv …, 2024 - arxiv.org

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m
vision encoder and the Gemma-2B language model. It is trained to be a versatile and …

保存引用被引用次数：108 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

No filter: Cultural and socioeconomic diversityin contrastive vision-language models

A Pouget, L Beyer, E Bugliarello, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

We study cultural and socioeconomic diversity in contrastive vision-language models
(VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to …

保存引用被引用次数：7 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

PaliGemma 2: A Family of Versatile VLMs for Transfer

A Steiner, AS Pinto, M Tschannen, D Keysers… - arxiv preprint arxiv …, 2024 - arxiv.org

PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based
on the Gemma 2 family of language models. We combine the SigLIP-So400m vision …

保存引用被引用次数：1 相关文章 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

LocCa: Visual Pretraining with Location-aware Captioners

Paligemma: A versatile 3b vlm for transfer

No filter: Cultural and socioeconomic diversityin contrastive vision-language models

PaliGemma 2: A Family of Versatile VLMs for Transfer