محقق Google

B Romera-Paredes, M Barekatain, A Novikov, M Balog… - Nature, 2024‏ - nature.com‏

Large language models (LLMs) have demonstrated tremendous capabilities in solving
complex tasks, from quantitative reasoning to understanding natural language. However …‏

ذخیره ارجاع بیان شده در 365 یافته مقاله‌های مربوط تمام نسخه‌های 14

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Quiet-star: Language models can teach themselves to think before speaking‏

E Zelikman, GR Harik, Y Shao, V Jayasiri… - First Conference on …, 2024‏ - openreview.net‏

When writing and talking, people sometimes pause to think. Although reasoning-focused
works have often framed reasoning as a method of answering questions or completing …‏

ذخیره ارجاع بیان شده در 80 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codereval: A benchmark of pragmatic code generation with generative pre-trained models‏

H Yu, B Shen, D Ran, J Zhang, Q Zhang, Y Ma… - Proceedings of the 46th …, 2024‏ - dl.acm.org‏

Code generation models based on the pre-training and fine-tuning paradigm have been
increasingly attempted by both academia and industry, resulting in well-known industrial …‏

ذخیره ارجاع بیان شده در 143 یافته مقاله‌های مربوط تمام نسخه‌های 8

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Buffer of thoughts: Thought-augmented reasoning with large language models‏

L Yang, Z Yu, T Zhang, S Cao, M Xu… - Advances in …, 2025‏ - proceedings.neurips.cc‏

Abstract We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented
reasoning approach for enhancing accuracy, efficiency and robustness of large language …‏

ذخیره ارجاع بیان شده در 23 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement‏

L Qiu, L Jiang, X Lu, M Sclar, V Pyatkin… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

The ability to derive underlying principles from a handful of observations and then
generalize to novel situations--known as inductive reasoning--is central to human …‏

ذخیره ارجاع بیان شده در 66 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cruxeval: A benchmark for code reasoning, understanding and execution‏

A Gu, B Rozière, H Leather, A Solar-Lezama… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …‏

ذخیره ارجاع بیان شده در 61 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents‏

K Yang, J Liu, J Wu, C Yang, YR Fung, S Li… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The prominent large language models (LLMs) of today differ from past language models not
only in size, but also in the fact that they are trained on a combination of natural language …‏

ذخیره ارجاع بیان شده در 66 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Selfevolve: A code evolution framework via large language models‏

S Jiang, Y Wang, Y Wang - arxiv preprint arxiv:2306.02907, 2023‏ - arxiv.org‏

Large language models (LLMs) have already revolutionized code generation, after being
pretrained on publicly available code data. However, while various methods have been …‏

ذخیره ارجاع بیان شده در 63 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language model crossover: Variation through few-shot prompting‏

E Meyerson, MJ Nelson, H Bradley, A Gaier… - ACM Transactions on …, 2024‏ - dl.acm.org‏

This article pursues the insight that language models naturally enable an intelligent variation
operator similar in spirit to evolutionary crossover. In particular, language models of …‏

ذخیره ارجاع بیان شده در 70 یافته مقاله‌های مربوط تمام نسخه‌های 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Execution-based evaluation for open-domain code generation‏

Z Wang, S Zhou, D Fried, G Neubig - arxiv preprint arxiv:2212.10481, 2022‏ - arxiv.org‏

To extend the scope of coding queries to more realistic settings, we propose ODEX, the first
Open-Domain EXecution-based natural language (NL) to Python code generation dataset …‏

ذخیره ارجاع بیان شده در 70 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Language models can teach themselves to program better

Mathematical discoveries from program search with large language models‏

Quiet-star: Language models can teach themselves to think before speaking‏

Codereval: A benchmark of pragmatic code generation with generative pre-trained models‏

Buffer of thoughts: Thought-augmented reasoning with large language models‏

Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement‏

Cruxeval: A benchmark for code reasoning, understanding and execution‏

If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents‏

Selfevolve: A code evolution framework via large language models‏

Language model crossover: Variation through few-shot prompting‏

Execution-based evaluation for open-domain code generation‏