The rise and potential of large language model based agents: A survey
Z ** language-image pre-training with frozen image encoders and large language models
The cost of vision-and-language pre-training has become increasingly prohibitive due to
end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and …
end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and …
MM1: methods, analysis and insights from multimodal LLM pre-training
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …
In particular, we study the importance of various architecture components and data choices …