- Academic Search

Y Li, Y Du, J Zhang, L Hou, P Grabowski, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-agent debate has proven effective in improving large language models quality for
reasoning and factuality tasks. While various role-playing strategies in multi-agent debates …

Lagre Referanse Sitert av 8 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

R-cot: Reverse chain-of-thought problem generation for geometric reasoning in large multimodal models

L Deng, Y Liu, B Li, D Luo, L Wu, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning
due to a lack of high-quality image-text paired data. Current geometric data generation …

Lagre Referanse Sitert av 4 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SketchAgent: Language-Driven Sequential Sketch Generation

Y Vinker, TR Shaham, K Zheng, A Zhao, JE Fan… - arxiv preprint arxiv …, 2024 - arxiv.org

Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and
visual communication that spans various disciplines. While artificial systems have driven …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

Lagre Referanse Sitert av 3 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use

Z Zhang, R Rossi, T Yu, F Dernoncourt… - arxiv preprint arxiv …, 2024 - arxiv.org

While vision-language models (VLMs) have demonstrated remarkable performance across
various tasks combining textual and visual information, they continue to struggle with fine …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Natural Language Inference Improves Compositionality in Vision-Language Models

P Cascante-Bonilla, Y Hou, YT Cao… - arxiv preprint arxiv …, 2024 - arxiv.org

Compositional reasoning in Vision-Language Models (VLMs) remains challenging as these
models often struggle to relate objects, attributes, and spatial relationships. Recent methods …

Lagre Referanse Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual scratchpads: Enabling global reasoning in vision

A Lotfi, E Fini, S Bengio, M Nabi, E Abbe - arxiv preprint arxiv:2410.08165, 2024 - arxiv.org

Modern vision models have achieved remarkable success in benchmarks where local
features provide critical information about the target. There is now a growing interest in …

Lagre Referanse Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Y Yan, J Su, J He, F Fu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org

Mathematical reasoning, a core aspect of human cognition, is vital across many domains,
from educational problem-solving to scientific advancements. As artificial general …

Lagre Referanse Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

WC Fan, T Rahman, L Sigal - arxiv preprint arxiv:2412.18072, 2024 - arxiv.org

With advances in foundational and vision-language models, and effective fine-tuning
techniques, a large number of both general and special-purpose models have been …

Lagre Referanse Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

R Luo, Z Zheng, Y Wang, Y Yu, X Ni, Z Lin… - arxiv preprint arxiv …, 2025 - arxiv.org

Chain-of-Thought (CoT) reasoning is widely used to enhance the mathematical reasoning
capabilities of large language models (LLMs). The introduction of process supervision for …

Lagre Referanse Sitert av 1 Beslektede artikler Alle 2 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models

Improving multi-agent debate with sparse communication topology

R-cot: Reverse chain-of-thought problem generation for geometric reasoning in large multimodal models

SketchAgent: Language-Driven Sequential Sketch Generation

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use

Natural Language Inference Improves Compositionality in Vision-Language Models

Visual scratchpads: Enabling global reasoning in vision

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics