Graph integration of structured, semistructured and unstructured data for data journalism
Digital data is a gold mine for modern journalism. However, datasets which interest
journalists are extremely heterogeneous, ranging from highly structured (relational …
journalists are extremely heterogeneous, ranging from highly structured (relational …
Multi-Source Data Repairing: A Comprehensive Survey
In the era of Big Data, integrating information from multiple sources has proven valuable in
various fields. To ensure a high-quality supply of multi-source data, repairing different types …
various fields. To ensure a high-quality supply of multi-source data, repairing different types …
Evaluation of duplicate detection algorithms: From quality measures to test data generation
Duplicate detection identifies multiple records in a dataset that represent the same real-
world object. Many such approaches exist, both in research and in industry. To investigate …
world object. Many such approaches exist, both in research and in industry. To investigate …
A probabilistic data fusion modeling approach for extracting true values from uncertain and conflicting attributes
Real-world data obtained from integrating heterogeneous data sources are often multi-
valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy …
valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy …
Towards deep entity resolution via soft schema matching
C Sun, D Shen - Neurocomputing, 2022 - Elsevier
Entity resolution (ER) leads a key role in data preprocessing. ER identifies records
corresponding to the same real-world entity. Recent years have witnessed a growing trend …
corresponding to the same real-world entity. Recent years have witnessed a growing trend …
Reproducible experiments on three-dimensional entity resolution with jedai
Abstract In Papadakis et al.(2020), we presented the latest release of JedAI, an open-source
Entity Resolution (ER) system that allows for building a large variety of end-to-end ER …
Entity Resolution (ER) system that allows for building a large variety of end-to-end ER …
Tab2know: Building a knowledge base from tables in scientific papers
Tables in scientific papers contain a wealth of valuable knowledge for the scientific
enterprise. To help the many of us who frequently consult this type of knowledge, we present …
enterprise. To help the many of us who frequently consult this type of knowledge, we present …
[HTML][HTML] Fine-Grained Tasks for Crowdsourced Entity Resolution
T Nie, H Mao, X Liu, S Yu - Applied Sciences, 2024 - mdpi.com
Entity resolution aims to identify records that point to the same entity in the real world. In
recent years, crowdsourcing approaches have provided new ideas for entity resolution …
recent years, crowdsourcing approaches have provided new ideas for entity resolution …
Fuzzy Integration of Data Lake Tables
Data integration is an important step in any data science pipeline where the objective is to
unify the information available in different datasets for comprehensive analysis. Full …
unify the information available in different datasets for comprehensive analysis. Full …
Mixed hierarchical networks for deep entity matching
CC Sun, DR Shen - Journal of Computer Science and Technology, 2021 - Springer
Entity matching is a fundamental problem of data integration. It groups records according to
underlying real-world entities. There is a growing trend of entity matching via deep learning …
underlying real-world entities. There is a growing trend of entity matching via deep learning …