Large-scale deduplication with constraints using dedupalog
We present a declarative framework for collective deduplication of entity references in the
presence of constraints. Constraints occur naturally in many data cleaning domains and can …
presence of constraints. Constraints occur naturally in many data cleaning domains and can …
[PDF][PDF] Raven-active learning of link specifications
With the growth of the Linked Data Web, time-efficient approaches for computing links
between data sources have become indispensable. Yet, in many cases, determining the …
between data sources have become indispensable. Yet, in many cases, determining the …
Active learning for crowd-sourced databases
B Mozafari, P Sarkar, MJ Franklin, MI Jordan… - ar** of active learning for entity resolution
Entity resolution is one of the central challenges when integrating data from large numbers
of data sources. Active learning for entity resolution aims to learn high-quality matching …
of data sources. Active learning for entity resolution aims to learn high-quality matching …
Supporting efficient record linkage for large data sets using map** techniques
This paper describes an efficient approach to record linkage. Given two lists of records, the
record-linkage problem consists of determining all pairs that are similar to each other, where …
record-linkage problem consists of determining all pairs that are similar to each other, where …
GDR: a system for guided data repair
Improving data quality is a time-consuming, labor-intensive and often domain specific
operation. Existing data repair approaches are either fully automated or not efficient in …
operation. Existing data repair approaches are either fully automated or not efficient in …
Remote robot execution through WWW simulation
This paper shows the state of art teleoperation and simulation systems and proposes some
possible network architectures devoted to the development of such systems. A method for …
possible network architectures devoted to the development of such systems. A method for …
Independent de-duplication in data cleaning
Sažetak Many organizations collect large amounts of data to support their business and
decision-making processes. The data originate from a variety of sources that may have …
decision-making processes. The data originate from a variety of sources that may have …
Scaling up the alias duplicate elimination system: A demostration
Duplicate elimination is an important stage in integrating data from multiple sources. The
challenges involved are finding a robust deduplication function that can identify when two …
challenges involved are finding a robust deduplication function that can identify when two …
Toward data cleaning with a target accuracy: A case study for value normalization
Many applications need to clean data with a target accuracy, eg, with at least 95% precision.
As far as we know, this problem has not been studied in depth. In this paper we take the first …
As far as we know, this problem has not been studied in depth. In this paper we take the first …