Large-scale deduplication with constraints using dedupalog

A Arasu, C Ré, D Suciu - 2009 IEEE 25th International …, 2009 - ieeexplore.ieee.org
We present a declarative framework for collective deduplication of entity references in the
presence of constraints. Constraints occur naturally in many data cleaning domains and can …

[PDF][PDF] Raven-active learning of link specifications

ACN Ngomo, J Lehmann, S Auer, K Höffner - Ontology Matching, 2011 - academia.edu
With the growth of the Linked Data Web, time-efficient approaches for computing links
between data sources have become indispensable. Yet, in many cases, determining the …

Active learning for crowd-sourced databases

B Mozafari, P Sarkar, MJ Franklin, MI Jordan… - ar** of active learning for entity resolution
A Primpeli, C Bizer, M Keuper - European Semantic Web Conference, 2020 - Springer
Entity resolution is one of the central challenges when integrating data from large numbers
of data sources. Active learning for entity resolution aims to learn high-quality matching …

Supporting efficient record linkage for large data sets using map** techniques

C Li, L **, S Mehrotra - World Wide Web, 2006 - Springer
This paper describes an efficient approach to record linkage. Given two lists of records, the
record-linkage problem consists of determining all pairs that are similar to each other, where …

GDR: a system for guided data repair

M Yakout, AK Elmagarmid, J Neville… - Proceedings of the 2010 …, 2010 - dl.acm.org
Improving data quality is a time-consuming, labor-intensive and often domain specific
operation. Existing data repair approaches are either fully automated or not efficient in …

Remote robot execution through WWW simulation

ST Puente, F Torres, F Ortiz… - … Conference on Pattern …, 2000 - ieeexplore.ieee.org
This paper shows the state of art teleoperation and simulation systems and proposes some
possible network architectures devoted to the development of such systems. A method for …

Independent de-duplication in data cleaning

A Udechukwu, C Ezeife, K Barker - Journal of Information and …, 2005 - hrcak.srce.hr
Sažetak Many organizations collect large amounts of data to support their business and
decision-making processes. The data originate from a variety of sources that may have …

Scaling up the alias duplicate elimination system: A demostration

S Sarawagi, A Kirpal - Conference on Data Engineering, 2003 - repository.ias.ac.in
Duplicate elimination is an important stage in integrating data from multiple sources. The
challenges involved are finding a robust deduplication function that can identify when two …

Toward data cleaning with a target accuracy: A case study for value normalization

A Ardalan, D Paulsen, AS Saini, W Cai… - … Conference on Big …, 2022 - ieeexplore.ieee.org
Many applications need to clean data with a target accuracy, eg, with at least 95% precision.
As far as we know, this problem has not been studied in depth. In this paper we take the first …