Formal mathematical reasoning: A new frontier in ai

K Yang, G Poesia, J He, W Li, K Lauter… - ar** formally verified code generation through self-improving translation and treefinement
P Aggarwal, B Parno, S Welleck - arxiv preprint arxiv:2412.06176, 2024‏ - arxiv.org
Automated code generation with large language models has gained significant traction, but
there remains no guarantee on the correctness of generated code. We aim to use formal …

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?

Y Zhou, H Liu, Z Chen, Y Tian, B Chen - arxiv preprint arxiv:2502.05252, 2025‏ - arxiv.org
Long-context large language models (LLMs) have recently shown strong performance in
information retrieval and long-document QA. However, to tackle the most challenging …

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

D Zhang, J Wang, T Sun - arxiv preprint arxiv:2502.11901, 2025‏ - arxiv.org
Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest
in two key ways:(1) a lack of sufficient corpora for proof-oriented programming languages …

dafny-annotator: AI-Assisted Verification of Dafny Programs

G Poesia, C Loughridge, N Amin - arxiv preprint arxiv:2411.15143, 2024‏ - arxiv.org
Formal verification has the potential to drastically reduce software bugs, but its high
additional cost has hindered large-scale adoption. While Dafny presents a promise to …

Dafny as Verification-Aware Intermediate Language for Code Generation

YC Li, S Zetzsche, S Somayyajula - arxiv preprint arxiv:2501.06283, 2025‏ - arxiv.org
Using large language models (LLMs) to generate source code from natural language
prompts is a popular and promising idea with a wide range of applications. One of its …

Automated Program Repair of Arithmetic Programs in Dafny using Large Language Models

V Wu - 2024‏ - search.proquest.com
Automated Program Repair of Arithmetic Programs in Dafny using Large Language Models
Page 1 FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Automated Program …