Data cleaning and machine learning: a systematic literature review

PO Côté, A Nikanjam, N Ahmed, D Humeniuk… - Automated Software …, 2024 - Springer
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …

The battleship approach to the low resource entity matching problem

B Genossar, A Gal, R Shraga - Proceedings of the ACM on Management …, 2023 - dl.acm.org
Entity matching, a core data integration problem, is the task of deciding whether two data
tuples refer to the same real-world entity. Recent advances in deep learning methods, using …

Active deep learning on entity resolution by risk sampling

Y Nafa, Q Chen, Z Chen, X Lu, H He, T Duan… - Knowledge-Based …, 2022 - Elsevier
While the state-of-the-art performance on entity resolution (ER) has been achieved by deep
learning, its effectiveness depends on large quantities of accurately labeled training data. To …

Deep clustering for data cleaning and integration

HT Rauf, A Freitas, NW Paton - arxiv preprint arxiv:2305.13494, 2023 - arxiv.org
Deep Learning (DL) techniques now constitute the state-of-the-art for important problems in
areas such as text and image processing, and there have been impactful results that deploy …

Transformer-based denoising adversarial variational entity resolution

S Li, H Wu - Journal of Intelligent Information Systems, 2023 - Springer
Entity resolution (ER), precisely identifying different representations of the same real-world
entities, is critical for data integration. The ER question has been studied for many years …

Low-resource entity resolution with domain generalization and active learning

Z Xu, N Wang - Neurocomputing, 2024 - Elsevier
Entity Resolution (ER), a fundamental task in data cleaning and integration, is critical in
various fields such as healthcare, e-commerce, and social networks. Traditional ER methods …

MixER: linear interpolation of latent space for entity resolution

H Wu, S Li - Complex & Intelligent Systems, 2024 - Springer
Entity resolution, accurately identifying various representations of the same real-world
entities, is a crucial part of data integration systems. While existing learning-based models …

A Framework to Evaluate the Quality of Integrated Datasets

FD Buono, G Faggioli, M Paganelli, A Baraldi… - ACM SIGAPP Applied …, 2023 - dl.acm.org
Evaluation is a bottleneck in data integration processes: it is performed by domain experts
through manual onerous data inspections. This task is particularly heavy in real business …

SAREM: semi-supervised active heterogeneous entity matching framework

J Du, T Nie, W Dou, D Shen, Y Kou - International Conference on Web …, 2022 - Springer
Entity matching is a key technique in data quality research, which refers to the identification
of records that refer to the same real-world entity in different data sources. This paper …

Dual-Module Feature Alignment Domain Adversarial Model for Entity Resolution

H Song, M Liu, S Zhang, Q Han - 2024 11th International …, 2024 - ieeexplore.ieee.org
Entity Resolution (ER) is a fundamental task in data integration, aiming to identify data
objects across different sources that refer to the same real-world entity. In recent years, deep …