Multi-core, main-memory joins: sort vs. hash revisited
In this paper we experimentally study the performance of main-memory, parallel, multi-core
join algorithms, focusing on sort-merge and (radix-) hash join. The relative performance of …
join algorithms, focusing on sort-merge and (radix-) hash join. The relative performance of …
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing
Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX)
in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically …
in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically …
A comprehensive study of main-memory partitioning and its application to large-scale comparison-and radix-sort
O Polychroniou, KA Ross - Proceedings of the 2014 ACM SIGMOD …, 2014 - dl.acm.org
Analytical database systems can achieve high throughput main-memory query execution by
being aware of the dynamics of highly-parallel modern hardware. Such systems rely on …
being aware of the dynamics of highly-parallel modern hardware. Such systems rely on …
High-speed query processing over high-speed networks
W Rödiger, T Mühlbauer, A Kemper… - arxiv preprint arxiv …, 2015 - arxiv.org
Modern database clusters entail two levels of networks: connecting CPUs and NUMA
regions inside a single server in the small and multiple servers in the large. The huge …
regions inside a single server in the small and multiple servers in the large. The huge …
A memory bandwidth-efficient hybrid radix sort on gpus
E Stehle, HA Jacobsen - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Sorting is at the core of many database operations, such as index creation, sort-merge joins,
and user-requested output sorting. As GPUs are emerging as a promising platform to …
and user-requested output sorting. As GPUs are emerging as a promising platform to …
Track join: distributed joins with minimal network traffic
Network communication is the slowest component of many operators in distributed parallel
databases deployed for large-scale analytics. Whereas considerable work has focused on …
databases deployed for large-scale analytics. Whereas considerable work has focused on …
BD-CATS: big data clustering at trillion particle scale
Modern cosmology and plasma physics codes are now capable of simulating trillions of
particles on petascale systems. Each timestep output from such simulations is on the order …
particles on petascale systems. Each timestep output from such simulations is on the order …
SIMD-and cache-friendly algorithm for sorting an array of structures
This paper describes our new algorithm for sorting an array of structures by efficiently
exploiting the SIMD instructions and cache memory of today's processors. Recently …
exploiting the SIMD instructions and cache memory of today's processors. Recently …
PARADIS: An efficient parallel algorithm for in-place radix sort
M Cho, D Brand, R Bordawekar, U Finkler… - Proceedings of the …, 2015 - dl.acm.org
In-place radix sort is a popular distribution-based sorting algorithm for short numeric or string
keys due to its linear run-time and constant memory complexity. However, efficient …
keys due to its linear run-time and constant memory complexity. However, efficient …
Locality-sensitive operators for parallel main-memory database clusters
W Rödiger, T Mühlbauer… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org
The growth in compute speed has outpaced the growth in network bandwidth over the last
decades. This has led to an increasing performance gap between local and distributed …
decades. This has led to an increasing performance gap between local and distributed …