Improving multi-agent debate with sparse communication topology

Y Li, Y Du, J Zhang, L Hou, P Grabowski, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-agent debate has proven effective in improving large language models quality for
reasoning and factuality tasks. While various role-playing strategies in multi-agent debates …

R-cot: Reverse chain-of-thought problem generation for geometric reasoning in large multimodal models

L Deng, Y Liu, B Li, D Luo, L Wu, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning
due to a lack of high-quality image-text paired data. Current geometric data generation …

SketchAgent: Language-Driven Sequential Sketch Generation

Y Vinker, TR Shaham, K Zheng, A Zhao, JE Fan… - arxiv preprint arxiv …, 2024 - arxiv.org
Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and
visual communication that spans various disciplines. While artificial systems have driven …

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use

Z Zhang, R Rossi, T Yu, F Dernoncourt… - arxiv preprint arxiv …, 2024 - arxiv.org
While vision-language models (VLMs) have demonstrated remarkable performance across
various tasks combining textual and visual information, they continue to struggle with fine …

Natural Language Inference Improves Compositionality in Vision-Language Models

P Cascante-Bonilla, Y Hou, YT Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Compositional reasoning in Vision-Language Models (VLMs) remains challenging as these
models often struggle to relate objects, attributes, and spatial relationships. Recent methods …

Visual scratchpads: Enabling global reasoning in vision

A Lotfi, E Fini, S Bengio, M Nabi, E Abbe - arxiv preprint arxiv:2410.08165, 2024 - arxiv.org
Modern vision models have achieved remarkable success in benchmarks where local
features provide critical information about the target. There is now a growing interest in …

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Y Yan, J Su, J He, F Fu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning, a core aspect of human cognition, is vital across many domains,
from educational problem-solving to scientific advancements. As artificial general …

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

WC Fan, T Rahman, L Sigal - arxiv preprint arxiv:2412.18072, 2024 - arxiv.org
With advances in foundational and vision-language models, and effective fine-tuning
techniques, a large number of both general and special-purpose models have been …

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

R Luo, Z Zheng, Y Wang, Y Yu, X Ni, Z Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
Chain-of-Thought (CoT) reasoning is widely used to enhance the mathematical reasoning
capabilities of large language models (LLMs). The introduction of process supervision for …