[HTML][HTML] GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution

L Gagliardelli, G Papadakis, G Simonini… - Information Systems, 2024 - Elsevier
Entity Resolution (ER) constitutes a core data integration task that relies on Blocking in order
to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall …

(Almost) all of entity resolution

O Binette, RC Steorts - Science Advances, 2022 - science.org
Whether the goal is to estimate the number of people that live in a congressional district, to
estimate the number of individuals that have died in an armed conflict, or to disambiguate …

Tcudb: Accelerating database with tensor processors

YC Hu, Y Li, HW Tseng - … of the 2022 International Conference on …, 2022 - dl.acm.org
The emergence of novel hardware accelerators has powered the tremendous growth of
machine learning in recent years. These accelerators deliver incomparable performance …

Generalized supervised meta-blocking

L Gagliardelli, G Papadakis, G Simonini… - Proceedings of the …, 2022 - iris.unimore.it
Entity Resolution is a core data integration task that relies on Blocking to scale to large
datasets. Schema-agnostic blocking achieves very high recall, requires no domain …

A big data platform exploiting auditable tokenization to promote good practices inside local energy communities

L Gagliardelli, L Zecchini, L Ferretti… - Future Generation …, 2023 - Elsevier
Abstract The Energy Community Platform (ECP) is a modular system conceived to promote a
conscious use of energy by the users inside local energy communities. It is composed of two …

BrewER: Entity Resolution On-Demand

L Zecchini, G Simonini, S Bergamaschi… - Proceedings of the …, 2023 - dl.acm.org
The task of entity resolution (ER) aims to detect multiple records describing the same real-
world entity in datasets and to consolidate them into a single consistent record. ER plays a …

[PDF][PDF] ECDP: A big data platform for the smart monitoring of local energy communities

L Gagliardelli, L Zecchini, D Beneventano… - CEUR Workshop …, 2022 - iris.unimore.it
In this paper we present the Energy Community Data Platform (ECDP), a middleware
platform designed to support the collection and the analysis of big data about the energy …

SparkDWM: a scalable design of a Data Washing Machine using Apache Spark

NKA Hagan, JR Talburt - Frontiers in Big Data, 2024 - frontiersin.org
Data volume has been one of the fast-growing assets of most real-world applications. This
increases the rate of human errors such as duplication of records, misspellings, and …

Deep and collective entity resolution in parallel

T Deng, W Fan, P Lu, X Luo, X Zhu… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
This paper studies deep and collective entity resolution (ER). As opposed to a single pass of
pairwise comparison of tuples in a single table, deep ER recursively identifies tuples that …

Incremental Entity Blocking over Heterogeneous Streaming Data

TB Araújo, K Stefanidis, CES Pires, J Nummenmaa… - Information, 2022 - mdpi.com
Web systems have become a valuable source of semi-structured and streaming data. In this
sense, Entity Resolution (ER) has become a key solution for integrating multiple data …