A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

A survey on large language model based autonomous agents

L Wang, C Ma, X Feng, Z Zhang, H Yang… - Frontiers of Computer …, 2024 - Springer
Autonomous agents have long been a research focus in academic and industry
communities. Previous research often focuses on training agents with limited knowledge …

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2023 - proceedings.neurips.cc
Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Lisa: Reasoning segmentation via large language model

X Lai, Z Tian, Y Chen, Y Li, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …

mplug-owl: Modularization empowers large language models with multimodality

Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated impressive zero-shot abilities on a
variety of open-ended tasks, while recent research has also explored the use of LLMs for …

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Instruction tuning for large language models: A survey

S Zhang, L Dong, X Li, S Zhang, X Sun, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper surveys research works in the quickly advancing field of instruction tuning (IT),
which can also be referred to as supervised fine-tuning (SFT)\footnote {In this paper, unless …

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Video-llava: Learning united visual representation by alignment before projection

B Lin, Y Ye, B Zhu, J Cui, M Ning, P **… - arxiv preprint arxiv …, 2023 - arxiv.org
The Large Vision-Language Model (LVLM) has enhanced the performance of various
downstream tasks in visual-language understanding. Most existing approaches encode …