Data cleaning and machine learning: a systematic literature review

PO Côté, A Nikanjam, N Ahmed, D Humeniuk… - Automated Software …, 2024 - Springer
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …

Challenges and strategies for wide-scale artificial intelligence (AI) deployment in healthcare practices: A perspective for healthcare organizations

P Esmaeilzadeh - Artificial Intelligence in Medicine, 2024 - Elsevier
Healthcare organizations have realized that Artificial intelligence (AI) can provide a
competitive edge through personalized patient experiences, improved patient outcomes …

Machine learning to assess and support safe drinking water supply: A systematic review

F Feng, Y Zhang, Z Chen, J Ni, Y Feng, Y **e… - Journal of …, 2024 - Elsevier
Drinking water is essential to public health and socioeconomic growth. Therefore, assessing
and ensuring drinking water supply is a critical task in modern society. Conventional …

Database meets artificial intelligence: A survey

X Zhou, C Chai, G Li, J Sun - IEEE Transactions on Knowledge …, 2020 - ieeexplore.ieee.org
Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can
make database more intelligent (AI4DB). For example, traditional empirical database …

Cost-based or learning-based? A hybrid query optimizer for query plan selection

X Yu, C Chai, G Li, J Liu - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Traditional cost-based optimizers are efficient and stable to generate optimal plans for
simple SQL queries, but they may not generate high-quality plans for complicated queries …

Coinsight: Visual storytelling for hierarchical tables with connected insights

G Li, R Li, Y Feng, Y Zhang, Y Luo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Extracting data insights and generating visual data stories from tabular data are critical parts
of data analysis. However, most existing studies primarily focus on tabular data stored as flat …

Selective data acquisition in the wild for model charging

C Chai, J Liu, N Tang, G Li, Y Luo - Proceedings of the VLDB …, 2022 - dl.acm.org
The lack of sufficient labeled data is a key bottleneck for practitioners in many real-world
supervised machine learning (ML) tasks. In this paper, we study a new problem, namely …

Feature augmentation with reinforcement learning

J Liu, C Chai, Y Luo, Y Lou, J Feng… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Sufficient good features are indispensable to train well-performed machine learning models.
However, it is com-mon that good features are not always enough, where feature …

Learned data-aware image representations of line charts for similarity search

Y Luo, Y Zhou, N Tang, G Li, C Chai… - Proceedings of the ACM on …, 2023 - dl.acm.org
Finding line-chart images similar to a given line-chart image query is a common task in data
exploration and image query systems, eg finding similar trends in stock markets or medical …

Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data

C Chai, J Liu, N Tang, J Fan, D Miao, J Wang… - Proceedings of the …, 2023 - dl.acm.org
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …