The rise and potential of large language model based agents: A survey

Z ** language-image pre-training with frozen image encoders and large language models
J Li, D Li, S Savarese, S Hoi - International conference on …, 2023 - proceedings.mlr.press
The cost of vision-and-language pre-training has become increasingly prohibitive due to
end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and …

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …