Google Učenjak

Shrani Navedi Navedeno v 231 virih Sorodni članki Vse različice: 6 V obliki HTML

Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Shrani Navedi Navedeno v 3106 virih Sorodni članki Vse različice: 4 V obliki HTML

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Shrani Navedi Navedeno v 599 virih Sorodni članki Vse različice: 5 V obliki HTML

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi… - Advances in …, 2025 - proceedings.neurips.cc

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular\emph {shallow alignment} method which maps image features into the …

Shrani Navedi Navedeno v 215 virih Sorodni članki Vse različice: 5 V obliki HTML

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

P Tong, E Brown, P Wu, S Woo… - Advances in …, 2025 - proceedings.neurips.cc

We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Shrani Navedi Navedeno v 166 virih Sorodni članki Vse različice: 4 V obliki HTML

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord… - Advances in Neural …, 2025 - proceedings.neurips.cc

The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

Shrani Navedi Navedeno v 403 virih Sorodni članki Vse različice: 4

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Shrani Navedi Navedeno v 115 virih Sorodni članki Vse različice: 5 V obliki HTML

Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd

X Dong, P Zhang, Y Zang, Y Cao… - Advances in …, 2025 - proceedings.neurips.cc

Abstract The Large Vision-Language Model (LVLM) field has seen significant
advancements, yet its progression has been hindered by challenges in comprehending fine …

Shrani Navedi Navedeno v 213 virih Sorodni članki Vse različice: 6 V obliki HTML

[PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …