The rise and potential of large language model based agents: A survey
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …
human intelligence. AI agents, which are artificial entities capable of sensing the …
Mm-llms: Recent advances in multimodal large language models
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …
Minigpt-4: Enhancing vision-language understanding with advanced large language models
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …
generating websites from handwritten text and identifying humorous elements within …
Improved baselines with visual instruction tuning
Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …
instruction tuning. In this paper we present the first systematic study to investigate the design …
Video-llava: Learning united visual representation by alignment before projection
The Large Vision-Language Model (LVLM) has enhanced the performance of various
downstream tasks in visual-language understanding. Most existing approaches encode …
downstream tasks in visual-language understanding. Most existing approaches encode …
MM1: methods, analysis and insights from multimodal LLM pre-training
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …
In particular, we study the importance of various architecture components and data choices …
A survey on multimodal large language models
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
Evaluating object hallucination in large vision-language models
Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …
language models (LVLM) have been recently explored by integrating powerful LLMs for …
3d-llm: Injecting the 3d world into large language models
Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration
Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …
instruction abilities across various open-ended tasks. However previous methods have …