Graph integration of structured, semistructured and unstructured data for data journalism

AC Anadiotis, O Balalau, C Conceição, H Galhardas… - Information Systems, 2022 - Elsevier
Digital data is a gold mine for modern journalism. However, datasets which interest
journalists are extremely heterogeneous, ranging from highly structured (relational …

Multi-Source Data Repairing: A Comprehensive Survey

C Ye, H Duan, H Zhang, H Zhang, H Wang, G Dai - Mathematics, 2023 - mdpi.com
In the era of Big Data, integrating information from multiple sources has proven valuable in
various fields. To ensure a high-quality supply of multi-source data, repairing different types …

Evaluation of duplicate detection algorithms: From quality measures to test data generation

F Panse, F Naumann - 2021 IEEE 37th International …, 2021 - ieeexplore.ieee.org
Duplicate detection identifies multiple records in a dataset that represent the same real-
world object. Many such approaches exist, both in research and in industry. To investigate …

A probabilistic data fusion modeling approach for extracting true values from uncertain and conflicting attributes

A Jaradat, F Safieddine, A Deraman, O Ali… - Big Data and Cognitive …, 2022 - mdpi.com
Real-world data obtained from integrating heterogeneous data sources are often multi-
valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy …

Towards deep entity resolution via soft schema matching

C Sun, D Shen - Neurocomputing, 2022 - Elsevier
Entity resolution (ER) leads a key role in data preprocessing. ER identifies records
corresponding to the same real-world entity. Recent years have witnessed a growing trend …

Reproducible experiments on three-dimensional entity resolution with jedai

G Mandilaras, G Papadakis, L Gagliardelli… - Information Systems, 2021 - Elsevier
Abstract In Papadakis et al.(2020), we presented the latest release of JedAI, an open-source
Entity Resolution (ER) system that allows for building a large variety of end-to-end ER …

Tab2know: Building a knowledge base from tables in scientific papers

B Kruit, H He, J Urbani - The Semantic Web–ISWC 2020: 19th …, 2020 - Springer
Tables in scientific papers contain a wealth of valuable knowledge for the scientific
enterprise. To help the many of us who frequently consult this type of knowledge, we present …

[HTML][HTML] Fine-Grained Tasks for Crowdsourced Entity Resolution

T Nie, H Mao, X Liu, S Yu - Applied Sciences, 2024 - mdpi.com
Entity resolution aims to identify records that point to the same entity in the real world. In
recent years, crowdsourcing approaches have provided new ideas for entity resolution …

Fuzzy Integration of Data Lake Tables

A Khatiwada, R Shraga, RJ Miller - arxiv preprint arxiv:2501.09211, 2025 - arxiv.org
Data integration is an important step in any data science pipeline where the objective is to
unify the information available in different datasets for comprehensive analysis. Full …

Mixed hierarchical networks for deep entity matching

CC Sun, DR Shen - Journal of Computer Science and Technology, 2021 - Springer
Entity matching is a fundamental problem of data integration. It groups records according to
underlying real-world entities. There is a growing trend of entity matching via deep learning …