Sharegpt4v: Improving large multi-modal models with better captions
Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …
However, the impact of different attributes (eg, data type, quality, and scale) of training data …
Vision language models are blind
Large language models (LLMs) with vision capabilities (eg, GPT-4o, Gemini 1.5, and Claude
3) are powering countless image-text processing applications, enabling unprecedented …
3) are powering countless image-text processing applications, enabling unprecedented …
Math-llava: Bootstrap** mathematical reasoning for multimodal large language models
Large language models (LLMs) have demonstrated impressive reasoning capabilities,
particularly in textual mathematical problem-solving. However, existing open-source image …
particularly in textual mathematical problem-solving. However, existing open-source image …
Mme-survey: A comprehensive survey on evaluation of multimodal llms
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …
Models (MLLMs) have garnered increased attention from both industry and academia …