Metamath: Bootstrap your own mathematical questions for large language models

L Yu, W Jiang, H Shi, J Yu, Z Liu, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have pushed the limits of natural language understanding
and exhibited excellent problem-solving ability. Despite the great success, most existing …

Adapting large language models for education: Foundational capabilities, potentials, and challenges

Q Li, L Fu, W Zhang, X Chen, J Yu, W **a… - arxiv preprint arxiv …, 2023 - arxiv.org
Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …

Genartist: Multimodal llm as an agent for unified image generation and editing

Z Wang, A Li, Z Li, X Liu - arxiv preprint arxiv:2407.05600, 2024 - arxiv.org
Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

Trigo: Benchmarking formal mathematical proof reduction for generative language models

J **ong, J Shen, Y Yuan, H Wang, Y Yin, Z Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Automated theorem proving (ATP) has become an appealing domain for exploring the
reasoning ability of the recent successful generative language models. However, current …

Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search

H **n, ZZ Ren, J Song, Z Shao, W Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce DeepSeek-Prover-V1. 5, an open-source language model designed for
theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both …

Lean-github: Compiling github lean repositories for a versatile lean prover

Z Wu, J Wang, D Lin, K Chen - arxiv preprint arxiv:2407.17227, 2024 - arxiv.org
Recently, large language models have presented promising results in aiding formal
mathematical reasoning. However, their performance is restricted due to the scarcity of …

Formal mathematical reasoning: A new frontier in ai

K Yang, G Poesia, J He, W Li, K Lauter… - arxiv preprint arxiv …, 2024 - arxiv.org
AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven
discovery in science, engineering, and beyond. Extensive efforts on AI4Math have mirrored …

Benchmarking large language models for math reasoning tasks

K Seßler, Y Rong, E Gözlüklü, E Kasneci - arxiv preprint arxiv:2408.10839, 2024 - arxiv.org
The use of Large Language Models (LLMs) in mathematical reasoning has become a
cornerstone of related research, demonstrating the intelligence of these models and …

Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks

Z Wang, D Fried, G Neubig - arxiv preprint arxiv:2401.12869, 2024 - arxiv.org
Language models (LMs) can solve tasks such as answering questions about tables or
images by writing programs. However, using primitive functions often leads to verbose and …

AutoVerus: Automated proof generation for Rust code

C Yang, X Li, MRH Misu, J Yao, W Cui, Y Gong… - arxiv preprint arxiv …, 2024 - arxiv.org
Generative AI has shown its values for many software engineering tasks. Still in its infancy,
large language model (LLM)-based proof generation lags behind LLM-based code …