BtrBlocks: efficient columnar compression for data lakes

M Kuschewski, D Sauerwein, A Alhomssi… - Proceedings of the ACM …, 2023 - dl.acm.org
Analytics is moving to the cloud and data is moving into data lakes. These reside on object
storage services like S3 and enable seamless data sharing and system interoperability. To …

The fastlanes compression layout: Decoding> 100 billion integers per second with scalar code

A Afroozeh, P Boncz - Proceedings of the VLDB Endowment, 2023 - dl.acm.org
The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC
and columnar database formats, in multiple ways. In this paper, we significantly accelerate …

Tile-based lightweight integer compression in GPU

A Shanbhag, BW Yogatama, X Yu… - Proceedings of the 2022 …, 2022 - dl.acm.org
GPUs are increasingly used for high-performance and interactive data analytics workloads
due to their capability to accelerate computation using massive parallelism. A key constraint …

Compressed linear algebra for large-scale machine learning

A Elgohary, M Boehm, PJ Haas, FR Reiss… - Proceedings of the …, 2016 - dl.acm.org
Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only
data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It …

FSST: fast random access string compression

P Boncz, T Neumann, V Leis - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Strings are prevalent in real-world data sets. They often occupy a large fraction of the data
and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a …

From a comprehensive experimental survey to a cost-based selection strategy for lightweight integer compression algorithms

P Damme, A Ungethüm, J Hildebrandt… - ACM Transactions on …, 2019 - dl.acm.org
Lightweight integer compression algorithms are frequently applied in in-memory database
systems to tackle the growing gap between processor speed and main memory bandwidth …

Robust and budget-constrained encoding configurations for in-memory database systems

M Boissier - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
Data encoding has been applied to database systems for decades as it mitigates bandwidth
bottlenecks and reduces storage requirements. But even in the presence of these …

Morphstore: Analytical query engine with a holistic compression-enabled processing model

P Damme, A Ungethüm, J Pietrzyk, A Krause… - arxiv preprint arxiv …, 2020 - arxiv.org
In this paper, we present MorphStore, an open-source in-memory columnar analytical query
engine with a novel holistic compression-enabled processing model. Basically, compression …

[PDF][PDF] Hardware-Oblivious SIMD Parallelism for In-Memory Column-Stores.

A Ungethüm, J Pietrzyk, P Damme, A Krause, D Habich… - CIDR, 2020 - dhabich.github.io
Vectorization based on the Single Instruction Multiple Data (SIMD) parallel paradigm is a
core technique to improve query processing performance especially in state-of-the-art in …

Compressed linear algebra for large-scale machine learning

A Elgohary, M Boehm, PJ Haas, FR Reiss… - The VLDB Journal, 2018 - Springer
Large-scale machine learning algorithms are often iterative, using repeated read-only data
access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is …