[КНИГА][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

Evaluation of entity resolution approaches on real-world match problems

H Köpcke, A Thor, E Rahm - Proceedings of the VLDB Endowment, 2010 - dl.acm.org
Despite the huge amount of recent research efforts on entity resolution (matching) there has
not yet been a comparative evaluation on the relative effectiveness and efficiency of …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

Frameworks for entity matching: A comparison

H Köpcke, E Rahm - Data & Knowledge Engineering, 2010 - Elsevier
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks
provide several methods and their combination to effectively solve different match tasks. In …

[PDF][PDF] 大数据的-个重要方面 数据可用性

**建中, 刘显敏 - 2013 - cs.sjtu.edu.cn
摘要!"# $% &'()*+,-.# $/0 123 4567893:;% &'<=>?@ ABCDEF GFHI# $8 J'KLMN
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …

Cardinality estimation of approximate substring queries using deep learning

S Kwon, W Jung, K Shim - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Cardinality estimation of an approximate substring query is an important problem in
database systems. Traditional approaches build a summary from the text data and estimate …

Astrid: accurate selectivity estimation for string predicates using deep learning

S Shetiya, S Thirumuruganathan, N Koudas… - Proceedings of the …, 2020 - par.nsf.gov
Accurate selectivity estimation for string predicates is a long-standing research challenge in
databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) …

[HTML][HTML] Parallel set similarity join on big data based on locality-sensitive hashing

MK Sohrabi, H Azgomi - Science of computer programming, 2017 - Elsevier
Due to the huge amount of involved data and time-consuming process of join operations, the
exact-match joins are rarely used for big data. The most common alternative for exact-match …

[PDF][PDF] DuDe: The duplicate detection toolkit

U Draisbach, F Naumann - Proceedings of the International Workshop on …, 2010 - hpi.de
Duplicate detection, also known as entity matching or record linkage, was first defined by
Newcombe et al.[19] and has been a research topic for several decades. The challenge is to …

Comparative evaluation of entity resolution approaches with fever

H Köpcke, A Thor, E Rahm - Proceedings of the VLDB Endowment, 2009 - dl.acm.org
We present FEVER, a new evaluation platform for entity resolution approaches. The modular
structure of the FEVER framework supports the incorporation or reconstruction of many …