Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
Deep entity matching: Challenges and opportunities
Entity matching refers to the task of determining whether two different representations refer
to the same real-world entity. It continues to be a prevalent problem for many organizations …
to the same real-world entity. It continues to be a prevalent problem for many organizations …
Semantics-aware dataset discovery from data lakes with contextualized column-based representation learning
Dataset discovery from data lakes is essential in many real application scenarios. In this
paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes …
paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes …
Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach
Finding joinable tables in data lakes is key procedure in many applications such as data
integration, data augmentation, data analysis, and data market. Traditional approaches that …
integration, data augmentation, data analysis, and data market. Traditional approaches that …
Deep learning approaches for similarity computation: A survey
The requirement for appropriate ways to measure the similarity between data objects is a
common but vital task in various domains, such as data mining, machine learning and so on …
common but vital task in various domains, such as data mining, machine learning and so on …
Machop: an end-to-end generalized entity matching framework
Real-world applications frequently seek to solve a general form of the Entity Matching (EM)
problem to find associated entities. Such scenarios include matching jobs to candidates in …
problem to find associated entities. Such scenarios include matching jobs to candidates in …
OmniMatch: Effective Self-Supervised Any-Join Discovery in Tabular Data Repositories
How can we discover join relationships among columns of tabular data in a data repository?
Can this be done effectively when metadata is missing? Traditional column matching works …
Can this be done effectively when metadata is missing? Traditional column matching works …
CrowdMed-II: a blockchain-based framework for efficient consent management in health data sharing
The healthcare industry faces serious problems with health data. Firstly, health data is
fragmented and its quality needs to be improved. Data fragmentation means that it is difficult …
fragmented and its quality needs to be improved. Data fragmentation means that it is difficult …
A transformation-based framework for KNN set similarity search
Set similarity search is a fundamental operation in a variety of applications. While many
previous studies focus on threshold based set similarity search and join, few efforts have …
previous studies focus on threshold based set similarity search and join, few efforts have …
Boosting approximate dictionary-based entity extraction with synonyms
Dictionary-based entity extraction is an important task in many data analysis applications,
such as academic search, document classification, and code auto-debugging. To improve …
such as academic search, document classification, and code auto-debugging. To improve …