An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arxiv preprint arxiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arxiv preprint arxiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Deep learning for entity matching: A design space exploration

S Mudgal, H Li, T Rekatsinas, AH Doan… - Proceedings of the …, 2018 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. In this
paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …

[KİTAP][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

DeepER--Deep Entity Resolution

M Ebraheem, S Thirumuruganathan, S Joty… - arxiv preprint arxiv …, 2017 - arxiv.org
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all
aspects of ER, there is still a high demand for democratizing ER-humans are heavily …

Creating embeddings of heterogeneous relational datasets for data integration tasks

R Cappuzzo, P Papotti… - Proceedings of the 2020 …, 2020 - dl.acm.org
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …

Deep learning for blocking in entity matching: a design space exploration

S Thirumuruganathan, H Li, N Tang… - Proceedings of the …, 2021 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM
solutions perform blocking then matching. Many works have applied deep learning (DL) to …