An overview of end-to-end entity resolution for big data
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
A survey of machine learning for big data processing
There is no doubt that big data are now rapidly expanding in all science and engineering
domains. While the potential of these massive data is undoubtedly significant, fully making …
domains. While the potential of these massive data is undoubtedly significant, fully making …
[BOK][B] The data matching process
P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …
major steps involved in this process: data pre-processing (cleaning and standardisation) …
A survey of indexing techniques for scalable record linkage and deduplication
P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …
same entities. When applied on a single database, this process is known as deduplication …
Frameworks for entity matching: A comparison
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks
provide several methods and their combination to effectively solve different match tasks. In …
provide several methods and their combination to effectively solve different match tasks. In …
[BOK][B] The four generations of entity resolution
Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of
the research examines ways for improving its effectiveness and time efficiency. The initial …
the research examines ways for improving its effectiveness and time efficiency. The initial …
Learning similarity metrics for event identification in social media
Social media sites (eg, Flickr, YouTube, and Facebook) are a popular distribution outlet for
users looking to share their experiences and interests on the Web. These sites host …
users looking to share their experiences and interests on the Web. These sites host …
Adaptive blocking: Learning to scale up record linkage
Many data mining tasks require computing similarity between pairs of objects. Pairwise
similarity computations are particularly important in record linkage systems, as well as in …
similarity computations are particularly important in record linkage systems, as well as in …
Data-Centric Systems and Applications
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …
accessible data source in the world. Web mining aims to discover useful information or …
On active learning of record matching packages
A Arasu, M Götz, R Kaushik - Proceedings of the 2010 ACM SIGMOD …, 2010 - dl.acm.org
We consider the problem of learning a record matching package (classifier) in an active
learning setting. In active learning, the learning algorithm picks the set of examples to be …
learning setting. In active learning, the learning algorithm picks the set of examples to be …