A catalog of data smells for coding tasks
Large Language Models (LLMs) are increasingly becoming fundamental in supporting
software developers in coding tasks. The massive datasets used for training LLMs are often …
software developers in coding tasks. The massive datasets used for training LLMs are often …
Security of Language Models for Code: A Systematic Literature Review
Language models for code (CodeLMs) have emerged as powerful tools for code-related
tasks, outperforming traditional methods and standard machine learning approaches …
tasks, outperforming traditional methods and standard machine learning approaches …
A survey on large language models for software engineering
Software Engineering (SE) is the systematic design, development, and maintenance of
software applications, underpinning the digital infrastructure of our modern mainworld. Very …
software applications, underpinning the digital infrastructure of our modern mainworld. Very …
[HTML][HTML] Devaic: A tool for security assessment of ai-generated code
Context: AI code generators are revolutionizing code writing and software development, but
their training on large datasets, including potentially untrusted source code, raises security …
their training on large datasets, including potentially untrusted source code, raises security …
A survey of neural code intelligence: Paradigms, advances and beyond
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …
code--holds immense potential for transformative impacts on the whole society. Bridging the …
Measuring impacts of poisoning on model parameters and embeddings for large language models of code
Large language models (LLMs) have revolutionized software development practices, yet
concerns about their safety have arisen, particularly regarding hidden backdoors, aka …
concerns about their safety have arisen, particularly regarding hidden backdoors, aka …
Dece: Deceptive cross-entropy loss designed for defending backdoor attacks
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved
significant success in code intelligence domain. However, the issue of security, particularly …
significant success in code intelligence domain. However, the issue of security, particularly …
Eliminating backdoors in neural code models via trigger inversion
Neural code models (NCMs) have been widely used for addressing various code
understanding tasks, such as defect detection and clone detection. However, numerous …
understanding tasks, such as defect detection and clone detection. However, numerous …
An input-denoising-based defense against stealthy backdoor attacks in large language models for code
Y Qu, S Huang, X Chen, T Bai, Y Yao - Information and Software …, 2025 - Elsevier
Abstract Context: Large Language Models are becoming integral to software development.
They are trained on open data from platforms like GitHub, making them vulnerable to …
They are trained on open data from platforms like GitHub, making them vulnerable to …
LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes
Backdoor attacks can mislead deep bug search models by exploring model-sensitive
assembly code, which can change alerts to benign results and cause buggy binaries to …
assembly code, which can change alerts to benign results and cause buggy binaries to …