A survey of deep active learning

P Ren, Y **ao, X Chang, PY Huang, Z Li… - ACM computing …, 2021 - dl.acm.org
Active learning (AL) attempts to maximize a model's performance gain while annotating the
fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount …

An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arxiv preprint arxiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

DeepER--Deep Entity Resolution

M Ebraheem, S Thirumuruganathan, S Joty… - arxiv preprint arxiv …, 2017 - arxiv.org
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all
aspects of ER, there is still a high demand for democratizing ER-humans are heavily …

Low-resource deep entity resolution with transfer and active learning

J Kasai, K Qian, S Gurajada, Y Li, L Popa - arxiv preprint arxiv …, 2019 - arxiv.org
Entity resolution (ER) is the task of identifying different representations of the same real-
world entities across databases. It is a key step for knowledge base creation and text mining …

Entity resolution with hierarchical graph attention networks

D Yao, Y Gu, G Cong, H **, X Lv - Proceedings of the 2022 International …, 2022 - dl.acm.org
Entity Resolution (ER) links entities that refer to the same real-world entity from different
sources. Existing work usually takes pairs of entities as input and judges those pairs …

Hydra: Large-scale social identity linkage via heterogeneous behavior modeling

S Liu, S Wang, F Zhu, J Zhang, R Krishnan - Proceedings of the 2014 …, 2014 - dl.acm.org
We study the problem of large-scale social identity linkage across different social media
platforms, which is of critical importance to business intelligence by gaining from social data …

icrowd: An adaptive crowdsourcing framework

J Fan, G Li, BC Ooi, K Tan, J Feng - Proceedings of the 2015 ACM …, 2015 - dl.acm.org
Crowdsourcing is widely accepted as a means for resolving tasks that machines are not
good at. Unfortunately, Crowdsourcing may yield relatively low-quality results if there is no …

Synthesizing entity matching rules by examples

R Singh, VV Meduri, A Elmagarmid, S Madden… - Proceedings of the …, 2017 - dl.acm.org
Entity matching (EM) is a critical part of data integration. We study how to synthesize entity
matching rules from positive-negative matching examples. The core of our solution is …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …