Paligemma: A versatile 3b vlm for transfer

L Beyer, A Steiner, AS Pinto, A Kolesnikov… - arxiv preprint arxiv …, 2024 - arxiv.org
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m
vision encoder and the Gemma-2B language model. It is trained to be a versatile and …

No filter: Cultural and socioeconomic diversityin contrastive vision-language models

A Pouget, L Beyer, E Bugliarello, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We study cultural and socioeconomic diversity in contrastive vision-language models
(VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to …

PaliGemma 2: A Family of Versatile VLMs for Transfer

A Steiner, AS Pinto, M Tschannen, D Keysers… - arxiv preprint arxiv …, 2024 - arxiv.org
PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based
on the Gemma 2 family of language models. We combine the SigLIP-So400m vision …