Challenges in deploying machine learning: a survey of case studies
In recent years, machine learning has transitioned from a field of academic research interest
to a field capable of solving real-world business problems. However, the deployment of …
to a field capable of solving real-world business problems. However, the deployment of …
Knowledge graph quality management: a comprehensive survey
B Xue, L Zou - IEEE Transactions on Knowledge and Data …, 2022 - ieeexplore.ieee.org
As a powerful expression of human knowledge in a structural form, knowledge graph (KG)
has drawn great attention from both the academia and the industry and a large number of …
has drawn great attention from both the academia and the industry and a large number of …
Holistic evaluation of language models
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …
technologies, but their capabilities, limitations, and risks are not well understood. We present …
Can foundation models wrangle your data?
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …
scale, can generalize to new tasks without any task-specific finetuning. As these models …
[HTML][HTML] A benchmark for data imputation methods
With the increasing importance and complexity of data pipelines, data quality became one of
the key challenges in modern software applications. The importance of data quality has …
the key challenges in modern software applications. The importance of data quality has …
[LIBRO][B] Data cleaning
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …
important problems in data management, since dirty data often leads to inaccurate data …
Holoclean: Holistic data repairs with probabilistic inference
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
Creating embeddings of heterogeneous relational datasets for data integration tasks
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …
integration problems. Some methods directly use pre-trained embeddings that were trained …
Large language model for table processing: A survey
Tables, typically two-dimensional and structured to store large amounts of data, are
essential in daily activities like database queries, spreadsheet manipulations, Web table …
essential in daily activities like database queries, spreadsheet manipulations, Web table …
Holodetect: Few-shot learning for error detection
We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …