[HTML][HTML] Automated data processing and feature engineering for deep learning and big data applications: a survey
A Mumuni, F Mumuni - Journal of Information and Intelligence, 2024 - Elsevier
Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly
from data. This approach has achieved impressive results and has contributed significantly …
from data. This approach has achieved impressive results and has contributed significantly …
[HTML][HTML] AutoML: A systematic review on automated machine learning with neural architecture search
Abstract AutoML (Automated Machine Learning) is an emerging field that aims to automate
the process of building machine learning models. AutoML emerged to increase productivity …
the process of building machine learning models. AutoML emerged to increase productivity …
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened …
Data quality carries an elevated significance in high-stakes AI due to its heightened …
AutoML: A survey of the state-of-the-art
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …
such as image recognition, object detection, and language modeling. However, building a …
Machine learning testing: Survey, landscapes and horizons
This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
A survey on data collection for machine learning: a big data-ai integration perspective
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …
multiple communities. There are largely two reasons data collection has recently become a …
Data validation for machine learning
Abstract Machine learning is a powerful tool for gleaning knowledge from massive amounts
of data. While a great deal of machine learning research has focused on improving the …
of data. While a great deal of machine learning research has focused on improving the …
Measuring the effect of training data on deep learning predictions via randomized experiments
We develop a new, principled algorithm for estimating the contribution of training data points
to the behavior of a deep learning model, such as a specific prediction it makes. Our …
to the behavior of a deep learning model, such as a specific prediction it makes. Our …
Similarity encoding for learning with dirty categorical variables
For statistical learning, categorical variables in a table are usually considered as discrete
entities and encoded separately to feature vectors, eg, with one-hot encoding.“Dirty” non …
entities and encoded separately to feature vectors, eg, with one-hot encoding.“Dirty” non …
Automating large-scale data quality verification
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …
decision. Missing or incorrect information seriously compromises any decision process …