Learned cardinality estimation: An in-depth study

K Kim, J Jung, I Seo, WS Han, K Choi… - Proceedings of the 2022 …, 2022 - dl.acm.org
Learned cardinality estimation (CE) has recently gained significant attention for replacing
long-studied traditional CE with machine learning, especially for deep learning. However …

Machine unlearning in learned databases: An experimental analysis

M Kurmanji, E Triantafillou, P Triantafillou - Proceedings of the ACM on …, 2024 - dl.acm.org
Machine learning models based on neural networks (NNs) are enjoying ever-increasing
attention in the Database (DB) community, both in research and practice. However, an …

On join sampling and the hardness of combinatorial output-sensitive join algorithms

S Deng, S Lu, Y Tao - Proceedings of the 42nd ACM SIGMOD-SIGACT …, 2023 - dl.acm.org
We present a dynamic index structure for join sampling. Built for an (equi-) join Q---let IN be
the total number of tuples in the input relations of Q---the structure uses~ O (IN) space …

Detect, distill and update: Learned DB systems facing out of distribution data

M Kurmanji, P Triantafillou - Proceedings of the ACM on Management of …, 2023 - dl.acm.org
Machine Learning (ML) is changing DBs as many DB components are being replaced by ML
models. One open problem in this setting is how to update such ML models in the presence …

Joinboost: Grow trees over normalized data using only SQL

Z Huang, R Sen, J Liu, E Wu - arxiv preprint arxiv:2307.00422, 2023 - arxiv.org
Although dominant for tabular data, ML libraries that train tree models over normalized
databases (eg, LightGBM, XGBoost) require the data to be denormalized as a single table …

Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality

X Tang, F Zhang, S Zhang, Y Liu, B He, B He… - Proceedings of the …, 2024 - dl.acm.org
> Sampling is one of the most widely employed approximations in big data processing.
Among various challenges in sampling design, sampling for join is particularly intriguing yet …

Weighted random sampling over joins

M Shekelyan, G Cormode, P Triantafillou… - arxiv preprint arxiv …, 2022 - arxiv.org
Joining records with all other records that meet a linkage condition can result in an
astronomically large number of combinations due to many-to-many relationships. For such …

Learned Optimizer for Online Approximate Query Processing in Data Exploration

L Liu, H Zhang, Y **g, Z He, K Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In the interactive data exploration, approximate query processing (AQP) can be used to
quickly return query results at the cost of accuracy. For online AQP, the sampler can be …

Compiling discrete probabilistic programs for vectorized exact inference

J Pan, A Shaikhha - Proceedings of the 32nd ACM SIGPLAN …, 2023 - dl.acm.org
Probabilistic programming languages (PPLs) are essential for reasoning under uncertainty.
Even though many real-world probabilistic programs involve discrete distributions, the state …

Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration

H Zhang, Y **g, Z He, K Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
For interactive data exploration, approximate query processing (AQP) is a useful approach
that usually uses samples to provide a timely response for queries by trading query …