An overview of end-to-end entity resolution for big data
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
Big data challenge: a data management perspective
There is a trend that, virtually everyone, ranging from big Web companies to traditional
enterprisers to physical science researchers to social scientists, is either already …
enterprisers to physical science researchers to social scientists, is either already …
Deep learning for entity matching: A design space exploration
Entity matching (EM) finds data instances that refer to the same real-world entity. In this
paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …
paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …
[BOEK][B] Data cleaning
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …
important problems in data management, since dirty data often leads to inaccurate data …
Holoclean: Holistic data repairs with probabilistic inference
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
Profiling relational data: a survey
Profiling data to determine metadata about a given dataset is an important and frequent
activity of any IT professional and researcher and is necessary for various use-cases. It …
activity of any IT professional and researcher and is necessary for various use-cases. It …
Holistic data cleaning: Putting violations into context
Data cleaning is an important problem and data quality rules are the most promising way to
face it with a declarative approach. Previous work has focused on specific formalisms, such …
face it with a declarative approach. Previous work has focused on specific formalisms, such …
NADEEF: a commodity data cleaning system
Despite the increasing importance of data quality and the rich theoretical and practical
contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf …
contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf …
Toward effective big data analysis in continuous auditing
J Zhang, X Yang, D Appelbaum - Accounting Horizons, 2015 - publications.aaahq.org
Big Data now pervades every sector and function of the global economy. This paper focuses
on the gaps between Big Data and the current capabilities of data analysis in continuous …
on the gaps between Big Data and the current capabilities of data analysis in continuous …
A survey on blocking technology of entity resolution
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …
profiles that correspond to the same real-world entity. Due to its inherently quadratic …