A catalog of data smells for coding tasks

A Vitale, R Oliveto, S Scalabrino - ACM Transactions on Software …, 2024 - dl.acm.org
Large Language Models (LLMs) are increasingly becoming fundamental in supporting
software developers in coding tasks. The massive datasets used for training LLMs are often …

Security of Language Models for Code: A Systematic Literature Review

Y Chen, W Sun, C Fang, Z Chen, Y Ge, T Han… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models for code (CodeLMs) have emerged as powerful tools for code-related
tasks, outperforming traditional methods and standard machine learning approaches …

A survey on large language models for software engineering

Q Zhang, C Fang, Y **e, Y Zhang, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, and maintenance of
software applications, underpinning the digital infrastructure of our modern mainworld. Very …

[HTML][HTML] Devaic: A tool for security assessment of ai-generated code

D Cotroneo, R De Luca, P Liguori - Information and Software Technology, 2025 - Elsevier
Context: AI code generators are revolutionizing code writing and software development, but
their training on large datasets, including potentially untrusted source code, raises security …

A survey of neural code intelligence: Paradigms, advances and beyond

Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …

Measuring impacts of poisoning on model parameters and embeddings for large language models of code

A Hussain, MRI Rabin, MA Alipour - Proceedings of the 1st ACM …, 2024 - dl.acm.org
Large language models (LLMs) have revolutionized software development practices, yet
concerns about their safety have arisen, particularly regarding hidden backdoors, aka …

Dece: Deceptive cross-entropy loss designed for defending backdoor attacks

G Yang, Y Zhou, X Chen, X Zhang, TY Zhuo… - arxiv preprint arxiv …, 2024 - arxiv.org
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved
significant success in code intelligence domain. However, the issue of security, particularly …

Eliminating backdoors in neural code models via trigger inversion

W Sun, Y Chen, C Fang, Y Feng, Y **ao, A Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural code models (NCMs) have been widely used for addressing various code
understanding tasks, such as defect detection and clone detection. However, numerous …

An input-denoising-based defense against stealthy backdoor attacks in large language models for code

Y Qu, S Huang, X Chen, T Bai, Y Yao - Information and Software …, 2025 - Elsevier
Abstract Context: Large Language Models are becoming integral to software development.
They are trained on open data from platforms like GitHub, making them vulnerable to …

LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes

X Yi, G Li, W Huang, X Lin, J Li, Y Liu - Proceedings of the 15th Asia …, 2024 - dl.acm.org
Backdoor attacks can mislead deep bug search models by exploring model-sensitive
assembly code, which can change alerts to benign results and cause buggy binaries to …