Data quality for software vulnerability datasets

R Croft, MA Babar, MM Kholoosi - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
The use of learning-based techniques to achieve automated software vulnerability detection
has been of longstanding interest within the software security domain. These data-driven …

Automating code-related tasks through transformers: The impact of pre-training

R Tufano, L Pascarella, G Bavota - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Transformers have gained popularity in the software engineering (SE) literature. These deep
learning models are usually pre-trained through a self-supervised objective, meant to …

Vulnerabilities and Security Patches Detection in OSS: A Survey

R Lin, Y Fu, W Yi, J Yang, J Cao, Z Dong, F **e… - ACM Computing …, 2024 - dl.acm.org
Over the past decade, Open Source Software (OSS) has experienced rapid growth and
widespread adoption, attributed to its openness and editability. However, this expansion has …

Invalidator: Automated patch correctness assessment via semantic and syntactic reasoning

T Le-Cong, DM Luong, XBD Le, D Lo… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Automated program repair (APR) faces the challenge of test overfitting, where generated
patches pass validation tests but fail to generalize. Existing methods for patch assessment …

Enhancing security patch identification by capturing structures in commits

B Wu, S Liu, R Feng, X **e, J Siow… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
With the rapid increasing number of open source software (OSS), the majority of the software
vulnerabilities in the open source components are fixed silently, which leads to the deployed …

Ccrep: Learning code change representations via pre-trained code model and query back

Z Liu, Z Tang, X **a, X Yang - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Representing code changes as numeric feature vectors, ie, code change representations, is
usually an essential step to automate many software engineering tasks related to code …

The devil is in the tails: How long-tailed code distributions impact large language models

X Zhou, K Kim, B Xu, J Liu, DG Han, D Lo - arxiv preprint arxiv …, 2023 - arxiv.org
Learning-based techniques, especially advanced Large Language Models (LLMs) for code,
have gained considerable popularity in various software engineering (SE) tasks. However …

Fine-grained commit-level vulnerability type prediction by CWE tree structure

S Pan, L Bao, X **a, D Lo, S Li - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Identifying security patches via code commits to allow early warnings and timely fixes for
Open Source Software (OSS) has received increasing attention. However, the existing …

Secureqwen: Leveraging llms for vulnerability detection in python codebases

A Mechri, MA Ferrag, M Debbah - Computers & Security, 2025 - Elsevier
Identifying vulnerabilities in software code is crucial for ensuring the security of modern
systems. However, manual detection requires expert knowledge and is time-consuming …

PatchFinder: A two-phase approach to security patch tracing for disclosed vulnerabilities in open-source software

K Li, J Zhang, S Chen, H Liu, Y Liu… - Proceedings of the 33rd …, 2024 - dl.acm.org
Open-source software (OSS) vulnerabilities are increasingly prevalent, emphasizing the
importance of security patches. However, in widely used security platforms like NVD, a …