Data-stream sampling: Basic techniques and results

PJ Haas - Data Stream Management: Processing High-Speed …, 2016 - Springer
Perhaps the most basic synopsis of a data stream is a sample of elements from the stream. A
key benefit of such a sample is its flexibility: the sample can serve as input to a wide variety …

Synopses for massive data: Samples, histograms, wavelets, sketches

G Cormode, M Garofalakis, PJ Haas… - … and Trends® in …, 2011 - nowpublishers.com
Abstract Methods for Approximate Query Processing (AQP) are essential for dealing with
massive data. They are often the only means of providing interactive response times when …

A unified deep model of learning from both data and queries for cardinality estimation

P Wu, G Cong - Proceedings of the 2021 International Conference on …, 2021 - dl.acm.org
Cardinality estimation is a fundamental problem in database systems. To capture the rich
joint data distributions of a relational table, most of the existing work either uses data as …

Ripple joins for online aggregation

PJ Haas, JM Hellerstein - ACM SIGMOD Record, 1999 - dl.acm.org
We present a new family of join algorithms, called ripple joins, for online processing of multi-
table aggregation queries in a relational database management system (DBMS). Such …

On random sampling over joins

S Chaudhuri, R Motwani, V Narasayya - ACM SIGMOD Record, 1999 - dl.acm.org
A major bottleneck in implementing sampling as a primitive relational operation is the
inefficiency of sampling the output of a query. It is not even known whether it is possible to …

Join synopses for approximate query answering

S Acharya, PB Gibbons, V Poosala… - Proceedings of the 1999 …, 1999 - dl.acm.org
In large data warehousing environments, it is often advantageous to provide fast,
approximate answers to complex aggregate queries based on statistical summaries of the …

Quicksel: Quick selectivity learning with mixture models

Y Park, S Zhong, B Mozafari - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org
Estimating the selectivity of a query is a key step in almost any cost-based query optimizer.
Most of today's databases rely on histograms or samples that are periodically refreshed by …

Estimating join selectivities using bandwidth-optimized kernel density models

M Kiefer, M Heimel, S Breß, V Markl - Proceedings of the VLDB …, 2017 - dl.acm.org
Accurately predicting the cardinality of intermediate plan operations is an essential part of
any modern relational query optimizer. The accuracy of said estimates has a strong and …

[หนังสือ][B] Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

SS Lightstone, TJ Teorey, T Nadeau - 2010 - books.google.com
The rapidly increasing volume of information contained in relational databases places a
strain on databases, performance, and maintainability: DBAs are under greater pressure …

Bifocal sampling for skew-resistant join size estimation

S Ganguly, PB Gibbons, Y Matias… - Proceedings of the 1996 …, 1996 - dl.acm.org
This paper introduces bifocal sampling, a new technique for estimating the size of an equi-
join of two relations. Bifocal sampling classifies tuples in each relation into two groups …