BtrBlocks: efficient columnar compression for data lakes
Analytics is moving to the cloud and data is moving into data lakes. These reside on object
storage services like S3 and enable seamless data sharing and system interoperability. To …
storage services like S3 and enable seamless data sharing and system interoperability. To …
The fastlanes compression layout: Decoding> 100 billion integers per second with scalar code
The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC
and columnar database formats, in multiple ways. In this paper, we significantly accelerate …
and columnar database formats, in multiple ways. In this paper, we significantly accelerate …
Tile-based lightweight integer compression in GPU
GPUs are increasingly used for high-performance and interactive data analytics workloads
due to their capability to accelerate computation using massive parallelism. A key constraint …
due to their capability to accelerate computation using massive parallelism. A key constraint …
Compressed linear algebra for large-scale machine learning
Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only
data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It …
data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It …
FSST: fast random access string compression
Strings are prevalent in real-world data sets. They often occupy a large fraction of the data
and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a …
and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a …
From a comprehensive experimental survey to a cost-based selection strategy for lightweight integer compression algorithms
Lightweight integer compression algorithms are frequently applied in in-memory database
systems to tackle the growing gap between processor speed and main memory bandwidth …
systems to tackle the growing gap between processor speed and main memory bandwidth …
Robust and budget-constrained encoding configurations for in-memory database systems
M Boissier - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
Data encoding has been applied to database systems for decades as it mitigates bandwidth
bottlenecks and reduces storage requirements. But even in the presence of these …
bottlenecks and reduces storage requirements. But even in the presence of these …
Morphstore: Analytical query engine with a holistic compression-enabled processing model
In this paper, we present MorphStore, an open-source in-memory columnar analytical query
engine with a novel holistic compression-enabled processing model. Basically, compression …
engine with a novel holistic compression-enabled processing model. Basically, compression …
[PDF][PDF] Hardware-Oblivious SIMD Parallelism for In-Memory Column-Stores.
Vectorization based on the Single Instruction Multiple Data (SIMD) parallel paradigm is a
core technique to improve query processing performance especially in state-of-the-art in …
core technique to improve query processing performance especially in state-of-the-art in …
Compressed linear algebra for large-scale machine learning
Large-scale machine learning algorithms are often iterative, using repeated read-only data
access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is …
access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is …