Efficient query evaluation on probabilistic databases

N Dalvi, D Suciu - The VLDB Journal, 2007 - Springer
We describe a framework for supporting arbitrarily complex SQL queries with “uncertain”
predicates. The query semantics is based on a probabilistic model and the results are …

Efficient similarity joins for near-duplicate detection

C **ao, W Wang, X Lin, JX Yu, G Wang - ACM Transactions on Database …, 2011 - dl.acm.org
With the increasing amount of data and the need to integrate data from multiple data
sources, one of the challenging issues is to identify near-duplicate records efficiently. In this …

Efficient similarity search and classification via rank aggregation

R Fagin, R Kumar, D Sivakumar - Proceedings of the 2003 ACM …, 2003 - dl.acm.org
We propose a novel approach to performing efficient similarity search and classification in
high dimensional data. In this framework, the database elements are vectors in a Euclidean …

Trio: A system for integrated management of data, accuracy, and lineage

J Widom - 2004 - ilpubs.stanford.edu
Trio is a new database system that manages not only data, but also the accuracy and
lineage of the data. Approximate (uncertain, probabilistic, incomplete, fuzzy, and imprecise!) …

ULDBs: Databases with uncertainty and lineage

O Benjelloun, AD Sarma, A Halevy, J Widom - 2005 - ilpubs.stanford.edu
This paper introduces\uldb s, an extension of relational databases with simple yet
expressive constructs for representing and manipulating both {\em lineage} and {\em …

[PDF][PDF] Web-scale data integration: You can only afford to pay as you go

J Madhavan, SR Jeffery, S Cohen, X Dong… - Proceedings of …, 2007 - datascienceassn.org
ABSTRACT The World Wide Web is witnessing an increase in the amount of structured
content–vast heterogeneous collections of structured data are on the rise due to the Deep …

Working models for uncertain data

AD Sarma, O Benjelloun, A Halevy… - … Conference on Data …, 2006 - ieeexplore.ieee.org
This paper explores an inherent tension in modeling and querying uncertain data: simple,
intuitive representations of uncertain data capture many application requirements, but these …

Top-k set similarity joins

C **ao, W Wang, X Lin, H Shang - 2009 IEEE 25th …, 2009 - ieeexplore.ieee.org
Similarity join is a useful primitive operation underlying many applications, such as near
duplicate Web page detection, data integration, and pattern recognition. Traditional similarity …

[PDF][PDF] Top-k query evaluation with probabilistic guarantees

M Theobald, G Weikum, R Schenkel - … conference on Very large data bases …, 2004 - vldb.org
Top-k queries based on ranking elements of multidimensional datasets are a fundamental
building block for many kinds of information discovery. The best known general-purpose …

[PDF][PDF] Klee: A framework for distributed top-k query algorithms

S Michel, P Triantafillou, G Weikum - … conference on Very large data bases, 2005 - Citeseer
This paper addresses the efficient processing of top-k queries in wide-area distributed data
repositories where the index lists for the attribute values (or text terms) of a query are …