Reasoning abilities of large language models: In-depth analysis on the abstraction and reasoning corpus

S Lee, W Sim, D Shin, W Seo, J Park, S Lee… - ACM Transactions on …, 2024 - dl.acm.org
The existing methods for evaluating the inference abilities of Large Language Models
(LLMs) have been predominantly results-centric, making it challenging to assess the …

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

Automl-agent: A multi-agent llm framework for full-pipeline automl

P Trirat, W Jeong, SJ Hwang - arxiv preprint arxiv:2410.02958, 2024 - arxiv.org
Automated machine learning (AutoML) accelerates AI development by automating tasks in
the development pipeline, such as optimal model search and hyperparameter tuning …

Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …

Training language models to critique with multi-agent feedback

T Lan, W Zhang, C Lyu, S Li, C Xu, H Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Critique ability, a meta-cognitive capability of humans, presents significant challenges for
LLMs to improve. Recent works primarily rely on supervised fine-tuning (SFT) using critiques …

Reasoning and planning with large language models in code development

H Ding, Z Fan, I Guehring, G Gupta, W Ha… - Proceedings of the 30th …, 2024 - dl.acm.org
Large Language Models (LLMs) are revolutionizing the field of code development by
leveraging their deep understanding of code patterns, syntax, and semantics to assist …

Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation

X Wang, Y Li, S Feng, P Yuan, B Pan, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on
various reasoning tasks but struggles with free-form generation due to the difficulty of …

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

M Nezhurina, L Cipolina-Kun, M Cherti… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are often described as being instances of foundation
models-that is, models that transfer strongly across various tasks and conditions in few-show …

Fine-tuning with divergent chains of thought boosts reasoning through self-correction in language models

H Puerto, T Chubakov, X Zhu, HT Madabushi… - 2024 - openreview.net
Requiring a large language model to generate intermediary reasoning steps has been
shown to be an effective way of boosting performance. In fact, instruction tuning on these …