[HTML][HTML] GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution

L Gagliardelli, G Papadakis, G Simonini… - Information Systems, 2024 - Elsevier
Entity Resolution (ER) constitutes a core data integration task that relies on Blocking in order
to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall …

A big data platform exploiting auditable tokenization to promote good practices inside local energy communities

L Gagliardelli, L Zecchini, L Ferretti… - Future Generation …, 2023 - Elsevier
Abstract The Energy Community Platform (ECP) is a modular system conceived to promote a
conscious use of energy by the users inside local energy communities. It is composed of two …

Open benchmark for filtering techniques in entity resolution

F Neuhof, M Fisichella, G Papadakis, K Nikoletos… - The VLDB Journal, 2024 - Springer
Entity Resolution identifies entity profiles that represent the same real-world object. A brute-
force approach that considers all pairs of entities suffers from quadratic time complexity. To …

SC-block: Supervised contrastive blocking within entity resolution pipelines

A Brinkmann, R Shraga, C Bizer - European Semantic Web Conference, 2024 - Springer
Millions of websites use the schema. org vocabulary to annotate structured data describing
products, local businesses, or events within their HTML pages. Integrating schema. org data …

Duplicate table discovery with xash

M Koch, M Esmailoghli, S Auer, Z Abedjan - 2023 - dl.gi.de
Data lakes are typically lightly curated and as such prone to data quality problems and
inconsistencies. In particular, duplicate tables are common in most repositories. The goal of …

Connected Components for Scaling Partial-Order Blocking to Billion Entities

T Backes, S Dietze - ACM Journal of Data and Information Quality, 2024 - dl.acm.org
In entity resolution, blocking pre-partitions data for further processing by more expensive
methods. Two entity mentions are in the same block if they share identical or related …

[BOOK][B] Duplicate Table Detection with Xash

M Koch, M Esmailoghli, S Auer, Z Abedjan - 2023 - repo.uni-hannover.de
Data lakes are typically lightly curated and as such prone to data quality problems and
inconsistencies. In particular, duplicate tables are common in most repositories. The goal of …

Progressive entity resolution with node embeddings

G Simonini, L Gagliardelli, M Rinaldi… - CEUR WORKSHOP …, 2022 - iris.unimore.it
Entity Resolution (ER) is the task of finding records that refer to the same real-world entity,
which are called matches. ER is a fundamental pre-processing step when dealing with dirty …

Big Data Integration for Data-Centric AI

S Bergamaschi, D Beneventano, G Simonini… - 1st Italian Conference …, 2022 - iris.unimore.it
Big data integration represents one of the main challenges for the use of techniques and
tools based on Artificial Intelligence (AI) in several crucial areas: eHealth, energy …

Integrazione di dati on-demand

L Zecchini - 2024 - iris.unimore.it
Companies and organizations depend heavily on their data to make informed business
decisions. Therefore, guaranteeing high data quality is critical to ensure the reliability of data …