Data and information quality

C Batini, M Scannapieco - Cham, Switzerland: Springer International …, 2016 - Springer
This book is the result of a study path that started in 2006, when the two authors of this book
published the book Data Quality: Concepts, Methodologies and Techniques. After 8 years …

[LIBRO][B] Foundations of data quality management

W Fan, F Geerts - 2012 - books.google.com
Data quality is one of the most important problems in data management. A database system
typically aims to support the creation, maintenance and use of large amount of data …

Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning

C Zhao, Y He - The World Wide Web Conference, 2019 - dl.acm.org
Entity matching (EM), also known as entity resolution, fuzzy join, and record linkage, refers to
the process of identifying records corresponding to the same real-world entities from …

Reasoning about record matching rules

W Fan, X Jia, J Li, S Ma - Proceedings of the VLDB Endowment …, 2009 - research.ed.ac.uk
To accurately match records it is often necessary to utilize the semantics of the data.
Functional dependencies (FDs) have proven useful in identifying tuples in a clean relation …

Large-scale deduplication with constraints using dedupalog

A Arasu, C Ré, D Suciu - 2009 IEEE 25th International …, 2009 - ieeexplore.ieee.org
We present a declarative framework for collective deduplication of entity references in the
presence of constraints. Constraints occur naturally in many data cleaning domains and can …

Differential dependencies: Reasoning and discovery

S Song, L Chen - ACM Transactions on Database Systems (TODS), 2011 - dl.acm.org
The importance of difference semantics (eg,“similar” or “dissimilar”) has been recently
recognized for declaring dependencies among various types of data, such as numerical …

[LIBRO][B] Data Cleaning

V Ganti, AD Sarma - 2022 - books.google.com
Data warehouses consolidate various activities of a business and often form the backbone
for generating reports that support important business decisions. Errors in data tend to creep …

Efficient approximate entity extraction with edit distance constraints

W Wang, C **ao, X Lin, C Zhang - Proceedings of the 2009 ACM …, 2009 - dl.acm.org
Named entity recognition aims at extracting named entities from unstructured text. A recent
trend of named entity recognition is finding approximate matches in the text with respect to a …

Dynamic constraints for record matching

W Fan, H Gao, X Jia, J Li, S Ma - The VLDB Journal, 2011 - Springer
This paper investigates constraints for matching records from unreliable data sources.(a) We
introduce a class of matching dependencies (md s) for specifying the semantics of unreliable …

[PDF][PDF] 大数据的-个重要方面 数据可用性

**建中, 刘显敏 - 计算机研究与发展, 2013 - cs.sjtu.edu.cn
摘要!"# $% &'()*+,-.# $/0 123 4567893:;% &'<=>?@ ABCDEF GFHI# $8 J'KLMN
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …