Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Qwen technical report

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024 - Springer
Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

Objaverse-xl: A universe of 10m+ 3d objects

M Deitke, R Liu, M Wallingford, H Ngo… - Advances in …, 2023 - proceedings.neurips.cc
Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Large language model as attributed training data generator: A tale of diversity and bias

Y Yu, Y Zhuang, J Zhang, Y Meng… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) have been recently leveraged as training data generators
for various natural language processing (NLP) tasks. While previous research has explored …