„Google“ mokslinčius

N Satish, M Harris, M Garland - 2009 IEEE International …, 2009 - ieeexplore.ieee.org

We describe the design of high-performance parallel radix sort and merge sort routines for
manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix …

Išsaugoti Cituoti Cituoja 938 Susiję straipsniai Visos 26 versijos

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

A comprehensive performance comparison of CUDA and OpenCL

J Fang, AL Varbanescu, H Sips - … International Conference on …, 2011 - ieeexplore.ieee.org

This paper presents a comprehensive performance comparison between CUDA and
OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world …

Išsaugoti Cituoti Cituoja 466 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] ust.hk

Relational joins on graphics processors

B He, K Yang, R Fang, M Lu, N Govindaraju… - Proceedings of the …, 2008 - dl.acm.org

We present a novel design and implementation of relational join algorithms for new-
generation graphics processing units (GPUs). The most recent GPU features include support …

Išsaugoti Cituoti Cituoja 525 Susiję straipsniai Visos 18 versijos

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

GPUTeraSort: high performance graphics co-processor sorting for large database management

N Govindaraju, J Gray, R Kumar… - Proceedings of the 2006 …, 2006 - dl.acm.org

We present a novel external sorting algorithm using graphics processors (GPUs) on large
databases composed of billions of records and wide keys. Our algorithm uses the data …

Išsaugoti Cituoti Cituoja 651 Susiję straipsniai Visos 18 versijos

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

N Satish, C Kim, J Chhugani, AD Nguyen… - Proceedings of the …, 2010 - dl.acm.org

Sort is a fundamental kernel used in many database operations. In-memory sorts are now
feasible; sort performance is limited by compute flops and main memory bandwidth rather …

Išsaugoti Cituoti Cituoja 325 Susiję straipsniai Visos 10 versijos

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

[PDF][PDF] A comparison of sorting algorithms for the connection machine CM-2

GE Blelloch, CE Leiserson, BM Maggs… - Proceedings of the third …, 1991 - dl.acm.org

We have implemented three parallel sorting algorithms on the Connection Machine
Supercomputer model CM-2: B atcher's bitonic sort, a parallel radix sor~ and a sample sort …

Išsaugoti Cituoti Cituoja 499 Susiję straipsniai Visos 14 versijos

High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

D Merrill, A Grimshaw - Parallel Processing Letters, 2011 - World Scientific

The need to rank and order data is pervasive, and many algorithms are fundamentally
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …

Išsaugoti Cituoti Cituoja 255 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Revisiting sorting for GPGPU stream architectures

DG Merrill, AS Grimshaw - … of the 19th international conference on …, 2010 - dl.acm.org

This poster presents efficient strategies for sorting large sequences of fixed-length keys (and
values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting …

Išsaugoti Cituoti Cituoja 251 Susiję straipsniai Visos 11 versijos

[Free GPT-4]
[DeepSeek]

[PDF] berkeley.edu

[KNYGA][B] Vector microprocessors

K Asanovic - 1998 - search.proquest.com

Most previous research into vector architectures has concentrated on supercomputing
applications and small enhancements to existing vector supercomputer implementations …

Išsaugoti Cituoti Cituoja 252 Susiję straipsniai Visos 7 versijos Paieška bibliotekoje

[Free GPT-4]
[DeepSeek]

[PDF] berkeley.edu

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R **a… - Proceedings of the 38th …, 2011 - dl.acm.org

We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

Išsaugoti Cituoti Cituoja 181 Susiję straipsniai Visos 18 versijos

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Radix sort for vector multiprocessors

Designing efficient sorting algorithms for manycore GPUs

A comprehensive performance comparison of CUDA and OpenCL

Relational joins on graphics processors

GPUTeraSort: high performance graphics co-processor sorting for large database management

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

[PDF][PDF] A comparison of sorting algorithms for the connection machine CM-2

High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

Revisiting sorting for GPGPU stream architectures

[KNYGA][B] Vector microprocessors

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators