Milvus: A purpose-built vector data management system
Recently, there has been a pressing need to manage high-dimensional vector data in data
science and AI applications. This trend is fueled by the proliferation of unstructured data and …
science and AI applications. This trend is fueled by the proliferation of unstructured data and …
The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds
We present the first learned index that supports predecessor, range queries and updates
within provably efficient time and space bounds in the worst case. In the (static) context of …
within provably efficient time and space bounds in the worst case. In the (static) context of …
Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity
AD Breslow, NS Jayasena - Proceedings of the VLDB Endowment, 2018 - dl.acm.org
Approximate set membership data structures (ASMDSs) are ubiquitous in computing. They
trade a tunable, often small, error rate (ϵ) for large space savings. The canonical ASMDS is …
trade a tunable, often small, error rate (ϵ) for large space savings. The canonical ASMDS is …
BtrBlocks: efficient columnar compression for data lakes
Analytics is moving to the cloud and data is moving into data lakes. These reside on object
storage services like S3 and enable seamless data sharing and system interoperability. To …
storage services like S3 and enable seamless data sharing and system interoperability. To …
Daga: Detecting attacks to in-vehicle networks via n-gram analysis
Recent research showcased several cyber-attacks against unmodified licensed vehicles,
demonstrating the vulnerability of their internal networks. Many solutions have already been …
demonstrating the vulnerability of their internal networks. Many solutions have already been …
Roaring bitmaps: Implementation of an optimized software library
Compressed bitmap indexes are used in systems such as Git or Oracle to accelerate
queries. They represent sets and often support operations such as unions, intersections …
queries. They represent sets and often support operations such as unions, intersections …
Instance-optimized data layouts for cloud analytics workloads
Today, businesses rely on efficiently running analytics on large amounts of operational and
historical data to gain business insights and competitive advantage. Increasingly, such …
historical data to gain business insights and competitive advantage. Increasingly, such …
Tile-based lightweight integer compression in GPU
GPUs are increasingly used for high-performance and interactive data analytics workloads
due to their capability to accelerate computation using massive parallelism. A key constraint …
due to their capability to accelerate computation using massive parallelism. A key constraint …
Speeding up set intersections in graph algorithms using simd instructions
In this paper, we focus on accelerating a widely employed computing pattern---set
intersection, to boost a group of graph algorithms. Graph's adjacency-lists can be naturally …
intersection, to boost a group of graph algorithms. Graph's adjacency-lists can be naturally …
[PDF][PDF] Identifying insufficient data coverage in databases with multiple relations
In today's data-driven world, it is critical that we use appropriate datasets for analysis and
decision-making. Datasets could be biased because they reflect existing inequalities in the …
decision-making. Datasets could be biased because they reflect existing inequalities in the …