An overview of end-to-end entity resolution for big data
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
Big data systems: A software engineering perspective
A Davoudian, M Liu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big Data Systems (BDSs) are an emerging class of scalable software technologies whereby
massive amounts of heterogeneous data are gathered from multiple sources, managed …
massive amounts of heterogeneous data are gathered from multiple sources, managed …
Can foundation models wrangle your data?
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …
scale, can generalize to new tasks without any task-specific finetuning. As these models …
Deep entity matching with pre-trained language models
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …
language models. We fine-tune and cast EM as a sequence-pair classification problem to …
Deep learning for blocking in entity matching: a design space exploration
Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM
solutions perform blocking then matching. Many works have applied deep learning (DL) to …
solutions perform blocking then matching. Many works have applied deep learning (DL) to …
[ΒΙΒΛΙΟ][B] The four generations of entity resolution
Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of
the research examines ways for improving its effectiveness and time efficiency. The initial …
the research examines ways for improving its effectiveness and time efficiency. The initial …
Linking sensitive data
Sensitive personal data are created in many application domains, and there is now an
increasing demand to share, integrate, and link such data within and across organisations in …
increasing demand to share, integrate, and link such data within and across organisations in …
RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …
data scientists, practitioners, and crowd workers? We answer this question by presenting …
Cost-effective in-context learning for entity resolution: A design space exploration
Entity resolution (ER) is an important data integration task with a wide spectrum of
applications. The state-of-the-art solutions on ER rely on pre-trained language models …
applications. The state-of-the-art solutions on ER rely on pre-trained language models …
Pre-trained embeddings for entity resolution: an experimental analysis
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving
language models to improve effectiveness. This is applied to both main steps of ER, ie …
language models to improve effectiveness. This is applied to both main steps of ER, ie …