Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Selective data acquisition in the wild for model charging

C Chai, J Liu, N Tang, G Li, Y Luo - Proceedings of the VLDB …, 2022 - dl.acm.org
The lack of sufficient labeled data is a key bottleneck for practitioners in many real-world
supervised machine learning (ML) tasks. In this paper, we study a new problem, namely …

Human-in-the-loop outlier detection

C Chai, L Cao, G Li, J Li, Y Luo, S Madden - Proceedings of the 2020 …, 2020 - dl.acm.org
Outlier detection is critical to a large number of applications from finance fraud detection to
health care. Although numerous approaches have been proposed to automatically detect …

Trustworthy AI-based Performance Diagnosis Systems for Cloud Applications: A Review

R **n, J Wang, P Chen, Z Zhao - ACM Computing Surveys, 2025 - dl.acm.org
Performance diagnosis systems are defined as detecting abnormal performance
phenomena and play a crucial role in cloud applications. An effective performance …

Contact tracing incentive for COVID-19 and other pandemic diseases from a crowdsourcing perspective

P Wang, C Lin, MS Obaidat, Z Yu… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
Governments of the world have invested a lot of manpower and material resources to
combat COVID-19 this year. At this moment, the most efficient way that could stop the …

Interactive cleaning for progressive visualization through composite questions

Y Luo, C Chai, X Qin, N Tang… - 2020 IEEE 36th …, 2020 - ieeexplore.ieee.org
In this paper, we study the problem of interactive cleaning for progressive visualization
(ICPV): Given a bad visualization V, it is to obtain a" cleaned" visualization V whose distance …

Interactively discovering and ranking desired tuples by data exploration

X Qin, C Chai, Y Luo, T Zhao, N Tang, G Li, J Feng… - The VLDB Journal, 2022 - Springer
Data exploration—the problem of extracting knowledge from database even if we do not
know exactly what we are looking for—is important for data discovery and analysis …

Automatic data acquisition for deep learning

J Liu, F Zhu, C Chai, Y Luo, N Tang - Proceedings of the VLDB …, 2021 - dl.acm.org
Deep learning (DL) has widespread applications and has revolutionized many industries.
Although automated machine learning (AutoML) can help us away from coding for DL …

Hint: harnessing the wisdom of crowds for handling multi-phase tasks

Y Fang, P Chen, T Han - Neural Computing and Applications, 2023 - Springer
The resourcefulness of crowdsourcing can be used to handle a wide range of complex
macro-tasks, such as travel planning, translation, and software development. Multi-phase …

Combining ad hoc text mining and descriptive analytics to investigate public EV charging prices in the United States

D Trinko, E Porter, J Dunckley, T Bradley, T Coburn - Energies, 2021 - mdpi.com
Electric vehicle (EV) charging infrastructure is present all over the United States, but
charging prices vary greatly, both in amount and in the methods by which they are assessed …