A survey on deep learning for software engineering

Y Yang, X **a, D Lo, J Grundy - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
In 2006, Geoffrey Hinton proposed the concept of training “Deep Neural Networks (DNNs)”
and an improved model training method to break the bottleneck of neural network …

A systematic literature review on the use of deep learning in software engineering research

C Watson, N Cooper, DN Palacio, K Moran… - ACM Transactions on …, 2022 - dl.acm.org
An increasingly popular set of techniques adopted by software engineering (SE)
researchers to automate development tasks are those rooted in the concept of Deep …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arxiv preprint arxiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …

Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks

R Puri, DS Kung, G Janssen, W Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Over the last several decades, software has been woven into the fabric of every aspect of
our society. As software development surges and code infrastructure of enterprise …

An empirical comparison of pre-trained models of source code

C Niu, C Li, V Ng, D Chen, J Ge… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
While a large number of pre-trained models of source code have been successfully
developed and applied to a variety of software engineering (SE) tasks in recent years, our …

Palmtree: Learning an assembly language model for instruction embedding

X Li, Y Qu, H Yin - Proceedings of the 2021 ACM SIGSAC Conference on …, 2021 - dl.acm.org
Deep learning has demonstrated its strengths in numerous binary analysis tasks, including
function boundary detection, binary code search, function prototype inference, value set …

Programl: A graph-based program representation for data flow analysis and compiler optimizations

C Cummins, ZV Fisches, T Ben-Nun… - International …, 2021 - proceedings.mlr.press
Abstract Machine learning (ML) is increasingly seen as a viable approach for building
compiler optimization heuristics, but many ML methods cannot replicate even the simplest of …

A survey on machine learning techniques for source code analysis

T Sharma, M Kechagia, S Georgiou, R Tiwari… - arxiv preprint arxiv …, 2021 - arxiv.org
The advancements in machine learning techniques have encouraged researchers to apply
these techniques to a myriad of software engineering tasks that use source code analysis …

Flow2vec: Value-flow-based precise code embedding

Y Sui, X Cheng, G Zhang, H Wang - Proceedings of the ACM on …, 2020 - dl.acm.org
Code embedding, as an emerging paradigm for source code analysis, has attracted much
attention over the past few years. It aims to represent code semantics through distributed …

Contrastive code representation learning

P Jain, A Jain, T Zhang, P Abbeel, JE Gonzalez… - arxiv preprint arxiv …, 2020 - arxiv.org
Recent work learns contextual representations of source code by reconstructing tokens from
their context. For downstream semantic understanding tasks like summarizing code in …