Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Chartllama: A multimodal llm for chart understanding and generation

Y Han, C Zhang, X Chen, X Yang, Z Wang, G Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
Multi-modal large language models have demonstrated impressive performances on most
vision-language tasks. However, the model generally lacks the understanding capabilities …

Onechart: Purify the chart structural extraction via one auxiliary token

J Chen, L Kong, H Wei, C Liu, Z Ge, L Zhao… - Proceedings of the …, 2024 - dl.acm.org
Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and
so forth. Even advanced large vision-language models (LVLMs) with billions of parameters …

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

Q Zhang, VSJ Huang, B Wang, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning

R **a, B Zhang, H Ye, X Yan, Q Liu, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged
continuously. However, their capacity to query information depicted in visual charts and …

Cdm: A reliable metric for fair and accurate formula recognition evaluation

B Wang, F Wu, L Ouyang, Z Gu, R Zhang, R **a… - arxiv preprint arxiv …, 2024 - arxiv.org
Formula recognition presents significant challenges due to the complicated structure and
varied notation of mathematical expressions. Despite continuous advancements in formula …

Chartcheck: Explainable fact-checking over real-world chart images

M Akhtar, N Subedi, V Gupta… - Findings of the …, 2024 - aclanthology.org
Whilst fact verification has attracted substantial interest in the natural language processing
community, verifying misinforming statements against data visualizations such as charts has …