A practical and effective sampling selection strategy for large scale deduplication

G Dal Bianco, R Galante, MA Goncalves… - … on Knowledge and …, 2015 - ieeexplore.ieee.org
The data deduplication task has attracted a considerable amount of attention from the
research community in order to provide effective and efficient solutions. The information …

Efficient interactive training selection for large-scale entity resolution

Q Wang, D Vatsalan, P Christen - … Discovery and Data Mining: 19th Pacific …, 2015 - Springer
Entity resolution (ER) has wide-spread applications in many areas, including e-commerce,
health-care, the social sciences, and crime and fraud detection. A crucial step in ER is the …

Pay-as-you-go configuration of entity resolution

R Maskat, NW Paton, SM Embury - Transactions on Large-Scale Data-and …, 2016 - Springer
Entity resolution, which seeks to identify records that represent the same entity, is an
important step in many data integration and data cleaning applications. However, entity …

Effiziente MapReduce-Parallelisierung von Entity Resolution-Workflows

L Kolb - 2014 - ul.qucosa.de
Abstract (DE) In den vergangenen Jahren hat das neu entstandene Paradigma
Infrastructure as a Service die IT-Welt massiv verändert. Die Bereitstellung von …

[PDF][PDF] An Efficient Message Lock Encryption Based Data Deduplication For Efficient Cloud Data Storage

D Ganesh - ijcse.com
In order to save storage space and upload bandwidth, data deduplication, a method for
removing duplicate copies of data, has been widely employed in cloud storage. Even when …

[PDF][PDF] Sampling Selection Strategy for Large Scale Deduplication for Web Data Search

R Lavanya, P Saranya, D Viji - International Journal of Applied …, 2017 - researchgate.net
The data quality can be reduced due to the presence of duplicate pairs with misspellings,
abbreviations, conflicting data, and redundant entities. Deduplication process manually …

[PDF][PDF] A survey: Enhanced block level message locked encryption for data deduplication

P Ahirwar, J Agrawal, S Sharma - International Research Journal of …, 2017 - academia.edu
Data deduplication is one of the emerging techniques to improve the capacity of the storage
media (Hard disk, Tape, CD, DVD, ROM) by removing redundant data and provide storage …

Uma Proposta para Reduçao do Conjunto de Treinamento Utilizando Aprendizagem Ativa

M Brandao, M Acordi, G Dal Bianco - Escola Regional de Banco de …, 2023 - sol.sbc.org.br
Métodos supervisionados são comumente utilizados em inúmeras tarefas como na
classificação de informações. Porém, a aprendizagem do método supervisionado depende …

[PDF][PDF] Sampling Selection Strategy for Large Scale Deduplication of synthetic and real datasets using Apache Spark

G Kumar, K Rupesh, MS Equabal, N Rajesh - 2018 - academia.edu
Due to the enormous increase in the generation of information by a number of sources, the
requirement of several new applications has become mandatory. These applications may be …

Uma estratégia eficiente de treinamento para Programação Genética aplicada a deduplicação de registros

DG Silva - 2016 - tede.ufam.edu.br
O volume de informação em formato digital tem aumentado consideravelmente nas últimas
décadas, e isso tem causado preocupação entre os administradores de grandes …