Google Наука

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Запазване Позоваване С позовавания в 143 Сродни статии Всички 7 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

The revolution of multimodal large language models: a survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Запазване Позоваване С позовавания в 46 Сродни статии Всички 9 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord… - Advances in Neural …, 2025 - proceedings.neurips.cc

The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

Запазване Позоваване С позовавания в 166 Сродни статии Всички 4 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Monkey: Image resolution and text label are important things for large multi-modal models

Z Li, B Yang, Q Liu, Z Ma, S Zhang… - proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Multimodal Models (LMMs) have shown promise in vision-language tasks but
struggle with high-resolution input and detailed scene understanding. Addressing these …

Запазване Позоваване С позовавания в 234 Сродни статии Всички 6 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …

Запазване Позоваване С позовавания в 213 Сродни статии Всички 6 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Openvla: An open-source vision-language-action model

MJ Kim, K Pertsch, S Karamcheti, T **ao… - arxiv preprint arxiv …, 2024 - arxiv.org

Large policies pretrained on a combination of Internet-scale vision-language data and
diverse robot demonstrations have the potential to change how we teach robots new skills …

Запазване Позоваване С позовавания в 225 Сродни статии Всички 3 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] openreview.net

Prismatic vlms: Investigating the design space of visually-conditioned language models

S Karamcheti, S Nair, A Balakrishna… - … on Machine Learning, 2024 - openreview.net

Visually-conditioned language models (VLMs) have seen growing adoption in applications
such as visual dialogue, scene understanding, and robotic task planning; adoption that has …

Запазване Позоваване С позовавания в 83 Сродни статии Всички 7 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Morevqa: Exploring modular reasoning models for video question answering

J Min, S Buch, A Nagrani, M Cho… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper addresses the task of video question answering (videoQA) via a decomposed
multi-stage modular reasoning framework. Previous modular methods have shown promise …

Запазване Позоваване С позовавания в 24 Сродни статии Всички 7 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arxiv preprint arxiv …, 2024 - arxiv.org

Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Запазване Позоваване С позовавания в 59 Сродни статии Всички 2 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Vitamin: Designing scalable vision models in the vision-language era

J Chen, Q Yu, X Shen, A Yuille… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …

Запазване Позоваване С позовавания в 13 Сродни статии Всички 9 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Pali-3 vision language models: Smaller, faster, stronger

Combating misinformation in the age of llms: Opportunities and challenges

The revolution of multimodal large language models: a survey

What matters when building vision-language models?

Monkey: Image resolution and text label are important things for large multi-modal models

Generative multimodal models are in-context learners

Openvla: An open-source vision-language-action model

Prismatic vlms: Investigating the design space of visually-conditioned language models

Morevqa: Exploring modular reasoning models for video question answering

Multilingual large language model: A survey of resources, taxonomy and frontiers

Vitamin: Designing scalable vision models in the vision-language era