Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Designing efficient sorting algorithms for manycore GPUs
We describe the design of high-performance parallel radix sort and merge sort routines for
manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix …
manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix …
A comprehensive performance comparison of CUDA and OpenCL
This paper presents a comprehensive performance comparison between CUDA and
OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world …
OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world …
Relational joins on graphics processors
We present a novel design and implementation of relational join algorithms for new-
generation graphics processing units (GPUs). The most recent GPU features include support …
generation graphics processing units (GPUs). The most recent GPU features include support …
GPUTeraSort: high performance graphics co-processor sorting for large database management
We present a novel external sorting algorithm using graphics processors (GPUs) on large
databases composed of billions of records and wide keys. Our algorithm uses the data …
databases composed of billions of records and wide keys. Our algorithm uses the data …
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Sort is a fundamental kernel used in many database operations. In-memory sorts are now
feasible; sort performance is limited by compute flops and main memory bandwidth rather …
feasible; sort performance is limited by compute flops and main memory bandwidth rather …
[PDF][PDF] A comparison of sorting algorithms for the connection machine CM-2
We have implemented three parallel sorting algorithms on the Connection Machine
Supercomputer model CM-2: B atcher's bitonic sort, a parallel radix sor~ and a sample sort …
Supercomputer model CM-2: B atcher's bitonic sort, a parallel radix sor~ and a sample sort …
High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing
D Merrill, A Grimshaw - Parallel Processing Letters, 2011 - World Scientific
The need to rank and order data is pervasive, and many algorithms are fundamentally
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …
Revisiting sorting for GPGPU stream architectures
DG Merrill, AS Grimshaw - … of the 19th international conference on …, 2010 - dl.acm.org
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and
values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting …
values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting …
[KNYGA][B] Vector microprocessors
K Asanovic - 1998 - search.proquest.com
Most previous research into vector architectures has concentrated on supercomputing
applications and small enhancements to existing vector supercomputer implementations …
applications and small enhancements to existing vector supercomputer implementations …
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Y Lee, R Avizienis, A Bishara, R **a… - Proceedings of the 38th …, 2011 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …