Multi-core, main-memory joins: sort vs. hash revisited

C Balkesen, G Alonso, J Teubner… - Proceedings of the VLDB …, 2013 - dl.acm.org
In this paper we experimentally study the performance of main-memory, parallel, multi-core
join algorithms, focusing on sort-merge and (radix-) hash join. The relative performance of …

Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

RM Yoo, CJ Hughes, K Lai, R Rajwar - Proceedings of the International …, 2013 - dl.acm.org
Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX)
in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically …

A comprehensive study of main-memory partitioning and its application to large-scale comparison-and radix-sort

O Polychroniou, KA Ross - Proceedings of the 2014 ACM SIGMOD …, 2014 - dl.acm.org
Analytical database systems can achieve high throughput main-memory query execution by
being aware of the dynamics of highly-parallel modern hardware. Such systems rely on …

High-speed query processing over high-speed networks

W Rödiger, T Mühlbauer, A Kemper… - arxiv preprint arxiv …, 2015 - arxiv.org
Modern database clusters entail two levels of networks: connecting CPUs and NUMA
regions inside a single server in the small and multiple servers in the large. The huge …

A memory bandwidth-efficient hybrid radix sort on gpus

E Stehle, HA Jacobsen - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Sorting is at the core of many database operations, such as index creation, sort-merge joins,
and user-requested output sorting. As GPUs are emerging as a promising platform to …

Track join: distributed joins with minimal network traffic

O Polychroniou, R Sen, KA Ross - Proceedings of the 2014 ACM …, 2014 - dl.acm.org
Network communication is the slowest component of many operators in distributed parallel
databases deployed for large-scale analytics. Whereas considerable work has focused on …

BD-CATS: big data clustering at trillion particle scale

MMA Patwary, S Byna, NR Satish… - Proceedings of the …, 2015 - dl.acm.org
Modern cosmology and plasma physics codes are now capable of simulating trillions of
particles on petascale systems. Each timestep output from such simulations is on the order …

SIMD-and cache-friendly algorithm for sorting an array of structures

H Inoue, K Taura - Proceedings of the VLDB Endowment, 2015 - dl.acm.org
This paper describes our new algorithm for sorting an array of structures by efficiently
exploiting the SIMD instructions and cache memory of today's processors. Recently …

PARADIS: An efficient parallel algorithm for in-place radix sort

M Cho, D Brand, R Bordawekar, U Finkler… - Proceedings of the …, 2015 - dl.acm.org
In-place radix sort is a popular distribution-based sorting algorithm for short numeric or string
keys due to its linear run-time and constant memory complexity. However, efficient …

Locality-sensitive operators for parallel main-memory database clusters

W Rödiger, T Mühlbauer… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org
The growth in compute speed has outpaced the growth in network bandwidth over the last
decades. This has led to an increasing performance gap between local and distributed …