Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey of machine learning for big code and naturalness
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …
engineering has recently taken important steps in proposing learnable probabilistic models …
A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software engineering
activity that embraces a broad range of applications, including but not limited to code …
activity that embraces a broad range of applications, including but not limited to code …
Scaling data-constrained language models
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
Coder reviewer reranking for code generation
Sampling diverse programs from a code language model and reranking with model
likelihood is a popular method for code generation but it is prone to preferring degenerate …
likelihood is a popular method for code generation but it is prone to preferring degenerate …
Wilds: A benchmark of in-the-wild distribution shifts
Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …
A novel neural source code representation based on abstract syntax tree
Exploiting machine learning techniques for analyzing programs has attracted much
attention. One key problem is how to represent code fragments well for follow-up analysis …
attention. One key problem is how to represent code fragments well for follow-up analysis …
Learning and evaluating contextual embedding of source code
Recent research has achieved impressive results on understanding and improving source
code by building up on machine-learning techniques developed for natural languages. A …
code by building up on machine-learning techniques developed for natural languages. A …
Natgen: generative pre-training by “naturalizing” source code
Pre-trained Generative Language models (eg, PLBART, CodeT5, SPT-Code) for source
code yielded strong results on several tasks in the past few years, including code generation …
code yielded strong results on several tasks in the past few years, including code generation …
code2vec: Learning distributed representations of code
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
code2seq: Generating sequences from structured representations of code
The ability to generate natural language sequences from source code snippets has a variety
of applications such as code summarization, documentation, and retrieval. Sequence-to …
of applications such as code summarization, documentation, and retrieval. Sequence-to …