Overview and importance of data quality for machine learning tasks

A Jain, H Patel, L Nagalapatti, N Gupta… - Proceedings of the 26th …, 2020 - dl.acm.org
It is well understood from literature that the performance of a machine learning (ML) model is
upper bounded by the quality of the data. While researchers and practitioners have focused …

A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Faster sorting algorithms discovered using deep reinforcement learning

DJ Mankowitz, A Michi, A Zhernov, M Gelmi, M Selvi… - Nature, 2023 - nature.com
Fundamental algorithms such as sorting or hashing are used trillions of times on any given
day. As demand for computation grows, it has become critical for these algorithms to be as …

Gorilla: Large language model connected with massive apis

SG Patil, T Zhang, X Wang, JE Gonzalez - arxiv preprint arxiv:2305.15334, 2023 - arxiv.org
Large Language Models (LLMs) have seen an impressive wave of advances recently, with
models now excelling in a variety of tasks, such as mathematical reasoning and program …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Coderl: Mastering code generation through pretrained models and deep reinforcement learning

H Le, Y Wang, AD Gotmare… - Advances in Neural …, 2022 - proceedings.neurips.cc
Program synthesis or code generation aims to generate a program that satisfies a problem
specification. Recent approaches using large-scale pretrained language models (LMs) have …

Competition-level code generation with alphacode

Y Li, D Choi, J Chung, N Kushman, J Schrittwieser… - Science, 2022 - science.org
Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist
programmers or even generate programs themselves could make programming more …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

Show your work: Scratchpads for intermediate computation with language models

M Nye, AJ Andreassen, G Gur-Ari… - arxiv preprint arxiv …, 2021 - arxiv.org
Large pre-trained language models perform remarkably well on tasks that can be done" in
one pass", such as generating realistic text or synthesizing computer programs. However …

Jigsaw: Large language models meet program synthesis

N Jain, S Vaidyanath, A Iyer, N Natarajan… - Proceedings of the 44th …, 2022 - dl.acm.org
Large pre-trained language models such as GPT-3 [10], Codex [11], and Google's language
model [7] are now capable of generating code from natural language specifications of …