Google Академія

K Stechly, K Valmeekam… - Advances in Neural …, 2025 - proceedings.neurips.cc

Large language model (LLM) performance on reasoning problems typically does not
generalize out of distribution. Previous work has claimed that this can be mitigated with …

Зберегти Послатися Цитовано в 31 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the self-verification limitations of large language models on reasoning and planning tasks

K Stechly, K Valmeekam, S Kambhampati - arxiv preprint arxiv …, 2024 - arxiv.org

There has been considerable divergence of opinion on the reasoning abilities of Large
Language Models (LLMs). While the initial optimism that reasoning might emerge …

Зберегти Послатися Цитовано в 44 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Eureka: Evaluating and understanding large foundation models

V Balachandran, J Chen, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org

Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …

Зберегти Послатися Цитовано в 7 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

“I Want It That Way”: Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

C Lawless, J Schoeffer, L Le, K Rowan, S Sen… - ACM Transactions on …, 2024 - dl.acm.org

A critical factor in the success of many decision support systems is the accurate modeling of
user preferences. Psychology research has demonstrated that users often develop their …

Зберегти Послатися Цитовано в 10 джерелах Пов’язані статті Кількість версій: 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

N Butt, V Chandrasekaran, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org

Evaluations are limited by benchmark availability. As models evolve, there is a need to
create benchmarks that can measure progress on new generative capabilities. However …

Зберегти Послатися Цитовано в 3 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From instructions to constraints: Language model alignment with automatic constraint verification

F Wang, C Shang, S Jain, S Wang, Q Ning… - arxiv preprint arxiv …, 2024 - arxiv.org

User alignment is crucial for adapting general-purpose language models (LMs) to
downstream tasks, but human annotations are often not available for all types of instructions …

Зберегти Послатися Цитовано в 8 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models

KU Qasim, J Zhang, T Alsahfi, AUR Butt - arxiv preprint arxiv:2501.02026, 2025 - arxiv.org

Enhancing the reasoning capabilities of Large Language Models remains a critical
challenge in artificial intelligence. We introduce RDoLT, Recursive Decomposition of Logical …

Зберегти Послатися Цитовано в 1 джерелах Пов’язані статті Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests

L Madmoni, A Zait, I Labzovsky, D Karmon - arxiv preprint arxiv …, 2024 - arxiv.org

Generative AI agents are often expected to respond to complex user requests that have No
One Right Answer (NORA), eg," design a vegetarian meal plan below 1800 calories". Such …

Зберегти Послатися Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[HTML] amazon.science

[HTML][HTML] Aligning to constraints for data-efficient language model customization

F Wang, C Shang, S Wang, S Jain, Q Ning, B Min… - 2025 - amazon.science

General-purpose language models (LMs) are aligned to diverse user intents, but fall short
when it comes to specific applications. While finetuning is the default method for customized …

Зберегти Послатися Пов’язані статті Кеш

[КНИГА][B] Towards Trustworthy Machine Learning: An Integer Programming Approach

CA Lawless - 2024 - search.proquest.com

Despite the proliferation of machine learning (ML) in a multitude of applications, current
black-box models, such as deep learning, remain hard to understand, critique, and judge by …

Зберегти Послатися Пов’язані статті Пошук бібліотеки

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Kitab: Evaluating llms on constraint satisfaction for information retrieval

Chain of thoughtlessness? an analysis of cot in planning

On the self-verification limitations of large language models on reasoning and planning tasks

Eureka: Evaluating and understanding large foundation models

“I Want It That Way”: Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

From instructions to constraints: Language model alignment with automatic constraint verification

Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models

The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests

[HTML][HTML] Aligning to constraints for data-efficient language model customization

[КНИГА][B] Towards Trustworthy Machine Learning: An Integer Programming Approach