A catalog of data smells for coding tasks

A Vitale, R Oliveto, S Scalabrino - ACM Transactions on Software …, 2024 - dl.acm.org
Large Language Models (LLMs) are increasingly becoming fundamental in supporting
software developers in coding tasks. The massive datasets used for training LLMs are often …

Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?

A Vitale, A Mastropaolo, R Oliveto, M Di Penta… - arxiv preprint arxiv …, 2025 - arxiv.org
Automated code summarization is a long-standing goal for code comprehension. This task
automatically generates documentation using a given method. Deep Learning (DL)-based …