Data cleaning and machine learning: a systematic literature review
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …
applications. Because the performance of an ML model is highly dependent on the quality of …
Can foundation models wrangle your data?
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …
scale, can generalize to new tasks without any task-specific finetuning. As these models …
[HTML][HTML] Construction of knowledge graphs: Current state and challenges
With Knowledge Graphs (KGs) at the center of numerous applications such as recommender
systems and question-answering, the need for generalized pipelines to construct and …
systems and question-answering, the need for generalized pipelines to construct and …
Neo: A learned query optimizer
Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …
the progress made over the past decades, query optimizers remain extremely complex …
Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …
considerable amount of time on data cleaning before model training. However, to date, there …
Machine learning and data cleaning: Which serves the other?
The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …
data quality, data cleaning and data integration systems powered by machine learning (ML) …
Baran: Effective error correction via a unified context representation and transfer learning
Traditional error correction solutions leverage handmaid rules or master data to find the
correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to …
correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to …
VerifAI: verified generative AI
Generative AI has made significant strides, yet concerns about the accuracy and reliability of
its outputs continue to grow. Such inaccuracies can have serious consequences such as …
its outputs continue to grow. Such inaccuracies can have serious consequences such as …
Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond
Deep Learning revolutionizes almost all fields of computer science including data
management. However, the demand for high-quality training data is slowing down deep …
management. However, the demand for high-quality training data is slowing down deep …
Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks
Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …
business intelligence (BI) and machine learning (ML) projects. Automating complex data …