A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Deep learning for source code modeling and generation: Models, applications, and challenges

THM Le, H Chen, MA Babar - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) techniques for Natural Language Processing have been evolving
remarkably fast. Recently, the DL advances in language modeling, machine translation, and …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

What is it like to program with artificial intelligence?

A Sarkar, AD Gordon, C Negreanu, C Poelitz… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models, such as OpenAI's codex and Deepmind's AlphaCode, can
generate code to solve a variety of problems expressed in natural language. This …

code2vec: Learning distributed representations of code

U Alon, M Zilberstein, O Levy, E Yahav - Proceedings of the ACM on …, 2019 - dl.acm.org
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …

code2seq: Generating sequences from structured representations of code

U Alon, S Brody, O Levy, E Yahav - arxiv preprint arxiv:1808.01400, 2018 - arxiv.org
The ability to generate natural language sequences from source code snippets has a variety
of applications such as code summarization, documentation, and retrieval. Sequence-to …

Unsupervised translation of programming languages

B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …

Deep code comment generation

X Hu, G Li, X **a, D Lo, Z ** - Proceedings of the 26th conference on …, 2018 - dl.acm.org
During software maintenance, code comments help developers comprehend programs and
reduce additional time spent on reading and navigating source code. Unfortunately, these …

Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong, H Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work we systematically review the recent advancements in software engineering with
language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 …