[HTML][HTML] GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution
Entity Resolution (ER) constitutes a core data integration task that relies on Blocking in order
to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall …
to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall …
(Almost) all of entity resolution
Whether the goal is to estimate the number of people that live in a congressional district, to
estimate the number of individuals that have died in an armed conflict, or to disambiguate …
estimate the number of individuals that have died in an armed conflict, or to disambiguate …
Tcudb: Accelerating database with tensor processors
The emergence of novel hardware accelerators has powered the tremendous growth of
machine learning in recent years. These accelerators deliver incomparable performance …
machine learning in recent years. These accelerators deliver incomparable performance …
Generalized supervised meta-blocking
Entity Resolution is a core data integration task that relies on Blocking to scale to large
datasets. Schema-agnostic blocking achieves very high recall, requires no domain …
datasets. Schema-agnostic blocking achieves very high recall, requires no domain …
A big data platform exploiting auditable tokenization to promote good practices inside local energy communities
Abstract The Energy Community Platform (ECP) is a modular system conceived to promote a
conscious use of energy by the users inside local energy communities. It is composed of two …
conscious use of energy by the users inside local energy communities. It is composed of two …
BrewER: Entity Resolution On-Demand
The task of entity resolution (ER) aims to detect multiple records describing the same real-
world entity in datasets and to consolidate them into a single consistent record. ER plays a …
world entity in datasets and to consolidate them into a single consistent record. ER plays a …
[PDF][PDF] ECDP: A big data platform for the smart monitoring of local energy communities
In this paper we present the Energy Community Data Platform (ECDP), a middleware
platform designed to support the collection and the analysis of big data about the energy …
platform designed to support the collection and the analysis of big data about the energy …
SparkDWM: a scalable design of a Data Washing Machine using Apache Spark
Data volume has been one of the fast-growing assets of most real-world applications. This
increases the rate of human errors such as duplication of records, misspellings, and …
increases the rate of human errors such as duplication of records, misspellings, and …
Deep and collective entity resolution in parallel
This paper studies deep and collective entity resolution (ER). As opposed to a single pass of
pairwise comparison of tuples in a single table, deep ER recursively identifies tuples that …
pairwise comparison of tuples in a single table, deep ER recursively identifies tuples that …
Incremental Entity Blocking over Heterogeneous Streaming Data
Web systems have become a valuable source of semi-structured and streaming data. In this
sense, Entity Resolution (ER) has become a key solution for integrating multiple data …
sense, Entity Resolution (ER) has become a key solution for integrating multiple data …