Self-discover: Large language models self-compose reasoning structures

P Zhou, J Pujara, X Ren, X Chen… - Advances in …, 2025 - proceedings.neurips.cc
We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-
intrinsic reasoning structures to tackle complex reasoning problems that are challenging for …

Key-point-driven data synthesis with its enhancement on mathematical reasoning

Y Huang, X Liu, Y Gong, Z Gou, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have shown great potential in complex reasoning tasks, yet
their performance is often hampered by the scarcity of high-quality and reasoning-focused …

Determinants of llm-assisted decision-making

E Eigner, T Händler - arxiv preprint arxiv:2402.17385, 2024 - arxiv.org
Decision-making is a fundamental capability in everyday life. Large Language Models
(LLMs) provide multifaceted support in enhancing human decision-making processes …

Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

S Sicari, JF Cevallos M, A Rizzardi… - ACM Computing …, 2024 - dl.acm.org
This survey summarises the most recent methods for building and assessing helpful, honest,
and harmless neural language models, considering small, medium, and large-size models …

Darg: Dynamic evaluation of large language models via adaptive reasoning graph

Z Zhang, J Chen, D Yang - Advances in Neural Information …, 2025 - proceedings.neurips.cc
The current paradigm of evaluating Large Language Models (LLMs) through static
benchmarks comes with significant limitations, such as vulnerability to data contamination …

Dyval: Dynamic evaluation of large language models for reasoning tasks

K Zhu, J Chen, J Wang, NZ Gong, D Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have achieved remarkable performance in various
evaluation benchmarks. However, concerns are raised about potential data contamination in …

Graph-enhanced large language models in asynchronous plan reasoning

F Lin, E La Malfa, V Hofmann, EM Yang, A Cohn… - arxiv preprint arxiv …, 2024 - arxiv.org
Planning is a fundamental property of human intelligence. Reasoning about asynchronous
plans is challenging since it requires sequential and parallel planning to optimize time costs …

Exposing limitations of language model agents in sequential-task compositions on the web

H Furuta, Y Matsuo, A Faust, I Gur - arxiv preprint arxiv:2311.18751, 2023 - arxiv.org
Language model agents (LMA) recently emerged as a promising paradigm on muti-step
decision making tasks, often outperforming humans and other reinforcement learning …

VipAct: Visual-perception enhancement via specialized vlm agent collaboration and tool-use

Z Zhang, R Rossi, T Yu, F Dernoncourt… - arxiv preprint arxiv …, 2024 - arxiv.org
While vision-language models (VLMs) have demonstrated remarkable performance across
various tasks combining textual and visual information, they continue to struggle with fine …

When reasoning meets information aggregation: A case study with sports narratives

Y Hu, K Song, S Cho, X Wang, W Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Reasoning is most powerful when an LLM accurately aggregates relevant information. We
examine the critical role of information aggregation in reasoning by requiring the LLM to …