[HTML][HTML] Automated data processing and feature engineering for deep learning and big data applications: a survey

A Mumuni, F Mumuni - Journal of Information and Intelligence, 2024 - Elsevier
Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly
from data. This approach has achieved impressive results and has contributed significantly …

[HTML][HTML] AutoML: A systematic review on automated machine learning with neural architecture search

I Salehin, MS Islam, P Saha, SM Noman, A Tuni… - Journal of Information …, 2024 - Elsevier
Abstract AutoML (Automated Machine Learning) is an emerging field that aims to automate
the process of building machine learning models. AutoML emerged to increase productivity …

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

N Sambasivan, S Kapania, H Highfill… - proceedings of the …, 2021 - dl.acm.org
AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened …

AutoML: A survey of the state-of-the-art

X He, K Zhao, X Chu - Knowledge-based systems, 2021 - Elsevier
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …

Machine learning testing: Survey, landscapes and horizons

JM Zhang, M Harman, L Ma… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Data validation for machine learning

N Polyzotis, M Zinkevich, S Roy… - … of machine learning …, 2019 - proceedings.mlsys.org
Abstract Machine learning is a powerful tool for gleaning knowledge from massive amounts
of data. While a great deal of machine learning research has focused on improving the …

Measuring the effect of training data on deep learning predictions via randomized experiments

J Lin, A Zhang, M Lécuyer, J Li… - … on Machine Learning, 2022 - proceedings.mlr.press
We develop a new, principled algorithm for estimating the contribution of training data points
to the behavior of a deep learning model, such as a specific prediction it makes. Our …

Similarity encoding for learning with dirty categorical variables

P Cerda, G Varoquaux, B Kégl - Machine Learning, 2018 - Springer
For statistical learning, categorical variables in a table are usually considered as discrete
entities and encoded separately to feature vectors, eg, with one-hot encoding.“Dirty” non …

Automating large-scale data quality verification

S Schelter, D Lange, P Schmidt, M Celikel… - Proceedings of the …, 2018 - dl.acm.org
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …