Combating misinformation in the age of llms: Opportunities and challenges
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …
and public trust. The emergence of large language models (LLMs) has great potential to …
The revolution of multimodal large language models: a survey
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …
this reason, inspired by the success of large language models, significant research efforts …
What matters when building vision-language models?
The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …
large language models and vision transformers. Despite the abundance of literature on this …
Monkey: Image resolution and text label are important things for large multi-modal models
Z Li, B Yang, Q Liu, Z Ma, S Zhang… - proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Multimodal Models (LMMs) have shown promise in vision-language tasks but
struggle with high-resolution input and detailed scene understanding. Addressing these …
struggle with high-resolution input and detailed scene understanding. Addressing these …
Generative multimodal models are in-context learners
Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …
simple instructions which current multimodal systems largely struggle to imitate. In this work …
Openvla: An open-source vision-language-action model
Large policies pretrained on a combination of Internet-scale vision-language data and
diverse robot demonstrations have the potential to change how we teach robots new skills …
diverse robot demonstrations have the potential to change how we teach robots new skills …
Prismatic vlms: Investigating the design space of visually-conditioned language models
Visually-conditioned language models (VLMs) have seen growing adoption in applications
such as visual dialogue, scene understanding, and robotic task planning; adoption that has …
such as visual dialogue, scene understanding, and robotic task planning; adoption that has …
Morevqa: Exploring modular reasoning models for video question answering
This paper addresses the task of video question answering (videoQA) via a decomposed
multi-stage modular reasoning framework. Previous modular methods have shown promise …
multi-stage modular reasoning framework. Previous modular methods have shown promise …
Multilingual large language model: A survey of resources, taxonomy and frontiers
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …
Models to handle and respond to queries in multiple languages, which achieves remarkable …
Vitamin: Designing scalable vision models in the vision-language era
Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …
community. The VLMs provide stronger and more generalizable feature embeddings …