A survey of deep learning for mathematical reasoning
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in
various fields, including science, engineering, finance, and everyday life. The development …
various fields, including science, engineering, finance, and everyday life. The development …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Chameleon: Plug-and-play compositional reasoning with large language models
Large language models (LLMs) have achieved remarkable progress in solving various
natural language processing tasks due to emergent reasoning abilities. However, LLMs …
natural language processing tasks due to emergent reasoning abilities. However, LLMs …
Llava-onevision: Easy visual task transfer
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …
by consolidating our insights into data, models, and visual representations in the LLaVA …
Cambrian-1: A fully open, vision-centric exploration of multimodal llms
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …
centric approach. While stronger language models can enhance multimodal capabilities, the …
Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?
The remarkable progress of Multi-modal Large Language Models (MLLMs) has gained
unparalleled attention. However, their capabilities in visual math problem-solving remain …
unparalleled attention. However, their capabilities in visual math problem-solving remain …
Learn to explain: Multimodal reasoning via thought chains for science question answering
When answering a question, humans utilize the information available across different
modalities to synthesize a consistent and complete chain of thought (CoT). This process is …
modalities to synthesize a consistent and complete chain of thought (CoT). This process is …
Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts
Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …
impressive skills in various domains, their ability for mathematical reasoning within visual …
Large language models for education: A survey and outlook
The advent of Large Language Models (LLMs) has brought in a new era of possibilities in
the realm of education. This survey paper summarizes the various technologies of LLMs in …
the realm of education. This survey paper summarizes the various technologies of LLMs in …
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output
P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …
supports long-contextual input and output. IXC-2.5 excels in various text-image …