Opportunities and challenges in data-centric AI

S Kumar, S Datta, V Singh, SK Singh, R Sharma - IEEE Access, 2024 - ieeexplore.ieee.org
Artificial intelligence (AI) systems are trained to solve complex problems and learn to
perform specific tasks by using large volumes of data, such as prediction, classification …

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

A systematic literature review on data quality assessment

O Reda, NC Benabdellah, A Zellou - Bulletin of Electrical Engineering and …, 2023 - beei.org
Defining and evaluating data quality can be a complex task as it varies depending on the
specific purpose for which the data is intended. To effectively assess data quality, it is …

Advances in exploratory data analysis, visualisation and quality for data centric AI systems

H Patel, S Guttula, RS Mittal, N Manwani… - Proceedings of the 28th …, 2022 - dl.acm.org
It is widely accepted that data preparation is one of the most time-consuming steps of the
machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of …

Taming technical bias in machine learning pipelines

S Schelter, J Stoyanovich - Bulletin of the Technical Committee on Data …, 2020 - par.nsf.gov
Machine Learning (ML) is commonly used to automate decisions in domains as varied as
credit and lending, medical diagnosis, and hiring. These decisions are consequential …

Information retrieval versus deep learning approaches for generating traceability links in bilingual projects

J Lin, Y Liu, J Cleland-Huang - Empirical Software Engineering, 2022 - Springer
Software traceability links are established between diverse artifacts of the software
development process in order to support tasks such as compliance analysis, safety …

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com
Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …

Auto-validate: Unsupervised data validation using data-domain patterns inferred from data lakes

J Song, Y He - Proceedings of the 2021 International Conference on …, 2021 - dl.acm.org
Complex data pipelines are increasingly common in diverse applications such as BI
reporting and ML modeling. These pipelines often recur regularly (eg, daily or weekly), as BI …