An overview of end-to-end entity resolution for big data
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
[ΒΙΒΛΙΟ][B] The data matching process
P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …
major steps involved in this process: data pre-processing (cleaning and standardisation) …
A survey of indexing techniques for scalable record linkage and deduplication
P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …
same entities. When applied on a single database, this process is known as deduplication …
Duplicate record detection: A survey
Often, in the real world, entities have two or more representations in databases. Duplicate
records do not share a common key and/or they contain errors that make duplicate matching …
records do not share a common key and/or they contain errors that make duplicate matching …
Robust and fast similarity search for moving object trajectories
An important consideration in similarity-based retrieval of moving object trajectories is the
definition of a distance function. The existing distance functions are usually sensitive to …
definition of a distance function. The existing distance functions are usually sensitive to …
Data-Centric Systems and Applications
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …
accessible data source in the world. Web mining aims to discover useful information or …
[ΒΙΒΛΙΟ][B] Exploratory data mining and data cleaning
T Dasu, T Johnson - 2003 - books.google.com
Written for practitioners of data mining, data cleaning and database management. Presents
a technical treatment of data quality including process, metrics, tools and algorithms …
a technical treatment of data quality including process, metrics, tools and algorithms …
[ΒΙΒΛΙΟ][B] Similarity search: the metric space approach
The area of similarity searching is a very hot topic for both research and c-mercial
applications. Current data processing applications use data with c-siderably less structure …
applications. Current data processing applications use data with c-siderably less structure …
A survey on blocking technology of entity resolution
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …
profiles that correspond to the same real-world entity. Due to its inherently quadratic …