- Academic Search

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Uložit Citovat Počet citací tohoto článku: 29 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Chartllama: A multimodal llm for chart understanding and generation

Y Han, C Zhang, X Chen, X Yang, Z Wang, G Yu… - arxiv preprint arxiv …, 2023 - arxiv.org

Multi-modal large language models have demonstrated impressive performances on most
vision-language tasks. However, the model generally lacks the understanding capabilities …

Uložit Citovat Počet citací tohoto článku: 85 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Onechart: Purify the chart structural extraction via one auxiliary token

J Chen, L Kong, H Wei, C Liu, Z Ge, L Zhao… - Proceedings of the …, 2024 - dl.acm.org

Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and
so forth. Even advanced large vision-language models (LVLMs) with billions of parameters …

Uložit Citovat Počet citací tohoto článku: 13 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

Uložit Citovat Počet citací tohoto článku: 14 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …

Uložit Citovat Počet citací tohoto článku: 10 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

Q Zhang, VSJ Huang, B Wang, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …

Uložit Citovat Počet citací tohoto článku: 1 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning

R **a, B Zhang, H Ye, X Yan, Q Liu, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged
continuously. However, their capacity to query information depicted in visual charts and …

Uložit Citovat Počet citací tohoto článku: 38 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cdm: A reliable metric for fair and accurate formula recognition evaluation

B Wang, F Wu, L Ouyang, Z Gu, R Zhang, R **a… - arxiv preprint arxiv …, 2024 - arxiv.org

Formula recognition presents significant challenges due to the complicated structure and
varied notation of mathematical expressions. Despite continuous advancements in formula …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Chartcheck: Explainable fact-checking over real-world chart images

M Akhtar, N Subedi, V Gupta… - Findings of the …, 2024 - aclanthology.org

Whilst fact verification has attracted substantial interest in the natural language processing
community, verifying misinforming statements against data visualizations such as charts has …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 4) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Structchart: Perception, structuring, reasoning for visual chart understanding

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Chartllama: A multimodal llm for chart understanding and generation

Onechart: Purify the chart structural extraction via one auxiliary token

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning

Cdm: A reliable metric for fair and accurate formula recognition evaluation

Chartcheck: Explainable fact-checking over real-world chart images