An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Big data challenge: a data management perspective

J Chen, Y Chen, X Du, C Li, J Lu, S Zhao… - Frontiers of computer …, 2013 - Springer
There is a trend that, virtually everyone, ranging from big Web companies to traditional
enterprisers to physical science researchers to social scientists, is either already …

Deep learning for entity matching: A design space exploration

S Mudgal, H Li, T Rekatsinas, AH Doan… - Proceedings of the …, 2018 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. In this
paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …

[BOEK][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Holoclean: Holistic data repairs with probabilistic inference

T Rekatsinas, X Chu, IF Ilyas, C Ré - arxiv preprint arxiv:1702.00820, 2017 - arxiv.org
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …

Profiling relational data: a survey

Z Abedjan, L Golab, F Naumann - The VLDB Journal, 2015 - Springer
Profiling data to determine metadata about a given dataset is an important and frequent
activity of any IT professional and researcher and is necessary for various use-cases. It …

Holistic data cleaning: Putting violations into context

X Chu, IF Ilyas, P Papotti - 2013 IEEE 29th International …, 2013 - ieeexplore.ieee.org
Data cleaning is an important problem and data quality rules are the most promising way to
face it with a declarative approach. Previous work has focused on specific formalisms, such …

NADEEF: a commodity data cleaning system

M Dallachiesa, A Ebaid, A Eldawy… - Proceedings of the …, 2013 - dl.acm.org
Despite the increasing importance of data quality and the rich theoretical and practical
contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf …

Toward effective big data analysis in continuous auditing

J Zhang, X Yang, D Appelbaum - Accounting Horizons, 2015 - publications.aaahq.org
Big Data now pervades every sector and function of the global economy. This paper focuses
on the gaps between Big Data and the current capabilities of data analysis in continuous …

A survey on blocking technology of entity resolution

BH Li, Y Liu, AM Zhang, WH Wang, S Wan - Journal of Computer Science …, 2020 - Springer
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …