[HTML][HTML] Cost-aware load balancing for multilingual record linkage using MapReduce

D Medhat, AH Yousef, C Salama - Ain Shams Engineering Journal, 2020 - Elsevier
Gathering and processing large amounts of data is increasing every day. Record linkage is
one of the most complex data-intensive tasks, which is used to accurately match records …

Towards reliable data analyses for smart cities

TB Araújo, C Cappiello, NP Kozievitch… - Proceedings of the 21st …, 2017 - dl.acm.org
As cities are becoming green and smart, public information systems are being revamped to
adopt digital technologies. There are several sources (official or not) that can provide …

Applying machine learning techniques for scaling out data quality algorithms in cloud computing environments

DC Nascimento, CE Pires, DG Mestre - Applied Intelligence, 2016 - Springer
Deduplication is the task of identifying the entities in a data set which refer to the same real
world object. Over the last decades, this problem has been largely investigated and many …

[HTML][HTML] FTRLIM: Distributed instance matching framework for large-scale knowledge graph fusion

H Zhu, X Wang, Y Jiang, H Fan, B Du, Q Liu - Entropy, 2021 - mdpi.com
Instance matching is a key task in knowledge graph fusion, and it is critical to improving the
efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking …

A fine-grained load balancing technique for improving partition-parallel-based ontology matching approaches

TB Araújo, CES Pires, TP da Nóbrega… - Knowledge-Based …, 2016 - Elsevier
Currently, the use of large ontologies in various areas of knowledge is increasing. Since
these ontologies can present overlap** of content, the identification of correspondences …

Towards the efficient parallelization of multi-pass adaptive blocking for entity matching

DG Mestre, CES Pires, DC Nascimento - Journal of Parallel and Distributed …, 2017 - Elsevier
Modern parallel computing programming models, such as MapReduce (MR), have proven to
be powerful tools for efficient parallel execution of data-intensive tasks such as Entity …

Multimodal provenance-based analysis of collaboration in business processes

ML Falci, A Magalhães, A Paes… - … of Information and …, 2021 - journals-sol.sbc.org.br
Modeling business processes as a set of activities to accomplish goals naturally makes them
be executed several times. Usually, such executions produce a large portion of provenance …

Leveraging the entity matching performance through adaptive indexing and efficient parallelization

DG Mestre - 2018 - dspace.sti.ufcg.edu.br
Entity Matching (EM), ie, the task of identifying all entities referring to the same realworld
object, is an important and difficult task for data sources integration and cleansing. A major …

Estimating record linkage costs in distributed environments

DC Nascimento, CES Pires, TB Araujo… - Journal of Parallel and …, 2020 - Elsevier
Record Linkage (RL) is the task of identifying duplicate entities in a dataset or multiple
datasets. In the era of Big Data, this task has gained notorious attention due to the intrinsic …

A parallel approach for matching large-scale ontologies

TB Araújo, CE Pires, TP da Nobrega… - Journal of Information …, 2015 - periodicos.ufmg.br
Recent years have seen an increasing use of large ontologies in various areas of
knowledge, eg health and agriculture. In this scenario, ontology matching is an important …