A software engineering perspective on engineering machine learning systems: State of the art and challenges

G Giray - Journal of Systems and Software, 2021 - Elsevier
Context: Advancements in machine learning (ML) lead to a shift from the traditional view of
software development, where algorithms are hard-coded by humans, to ML systems …

[PDF][PDF] Calibration and correctness of language models for code

C Spiess, D Gros, KS Pai, M Pradel… - arxiv preprint arxiv …, 2024 - software-lab.org
Machine learning models are widely used, but can also often be wrong. Users would benefit
from a reliable indication of whether a given output from a given model should be trusted, so …

Prioritizing test inputs for deep neural networks via mutation analysis

Z Wang, H You, J Chen, Y Zhang… - 2021 IEEE/ACM 43rd …, 2021 - ieeexplore.ieee.org
Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the
quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is …

Are machine learning cloud apis used correctly?

C Wan, S Liu, H Hoffmann, M Maire… - 2021 IEEE/ACM 43rd …, 2021 - ieeexplore.ieee.org
Machine learning (ML) cloud APIs enable developers to easily incorporate learning
solutions into software systems. Unfortunately, ML APIs are challenging to use correctly and …

Automated testing of software that uses machine learning apis

C Wan, S Liu, S **e, Y Liu, H Hoffmann… - Proceedings of the 44th …, 2022 - dl.acm.org
An increasing number of software applications incorporate machine learning (ML) solutions
for cognitive tasks that statistically mimic human behaviors. To test such software …

A review and refinement of surprise adequacy

M Weiss, R Chakraborty… - 2021 IEEE/ACM Third …, 2021 - ieeexplore.ieee.org
Surprise Adequacy (SA) is one of the emerging and most promising adequacy criteria for
Deep Learning (DL) testing. As an adequacy criterion, it has been used to assess the …

Quality and Trust in LLM-generated Code

C Spiess, D Gros, KS Pai, M Pradel… - arxiv e …, 2024 - ui.adsabs.harvard.edu
Abstract Machine learning models are widely used but can also often be wrong. Users
would benefit from a reliable indication of whether a given output from a given model should …

Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical Study

Z Wang, S Xu, L Fan, X Cai, L Li, Z Liu - ACM Transactions on Software …, 2024 - dl.acm.org
Quality assurance of deep neural networks (DNNs) is crucial for the deployment of DNN-
based software, especially in mission-and safety-critical tasks. Inspired by structural white …

Keeper: Automated Testing and Fixing of Machine Learning Software

C Wan, S Liu, S **e, Y Liu, H Hoffmann… - ACM Transactions on …, 2024 - dl.acm.org
The increasing number of software applications incorporating machine learning (ML)
solutions has led to the need for testing techniques. However, testing ML software requires …

Resource‐adaptive and OOD‐robust inference of deep neural networks on IoT devices

C Robertson, NA Tong, TT Nguyen… - CAAI Transactions …, 2024 - Wiley Online Library
Efficiently executing inference tasks of deep neural networks on devices with limited
resources poses a significant load in IoT systems. To alleviate the load, one innovative …