Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Petabricks: A language and compiler for algorithmic choice
It is often impossible to obtain a one-size-fits-all solution for high performance algorithms
when considering different choices for data distributions, parallelism, transformations, and …
when considering different choices for data distributions, parallelism, transformations, and …
Language and compiler support for auto-tuning variable-accuracy algorithms
Approximating ideal program outputs is a common technique for solving computationally
difficult problems, for adhering to processing or timing constraints, and for performance …
difficult problems, for adhering to processing or timing constraints, and for performance …
Discrete Fourier transform on multicore
This article gives an overview on the techniques needed to implement the discrete Fourier
transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible …
transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible …
xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor
F Liu, W Ma, Y Zhao, D Chen, Y Hu, Q Lu… - CCF Transactions on …, 2023 - Springer
High performance extended math library is used by many scientific engineering and artificial
intelligence applications, which usually involves many common mathematical computations …
intelligence applications, which usually involves many common mathematical computations …
[PDF][PDF] Topologically adaptive parallel breadth-first search on multicore processors
Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used
to abstract various challenging computational problems. Due to the fine-grained irregular …
to abstract various challenging computational problems. Due to the fine-grained irregular …
MFFT: A GPU accelerated highly efficient mixed-precision large-scale FFT framework
Y Zhao, F Liu, W Ma, H Li, Y Peng… - ACM Transactions on …, 2023 - dl.acm.org
Fast Fourier transform (FFT) is widely used in computing applications in large-scale parallel
programs, and data communication is the main performance bottleneck of FFT and seriously …
programs, and data communication is the main performance bottleneck of FFT and seriously …
Unconventional parallelization of nondeterministic applications
The demand for thread-level-parallelism (TLP) on commodity processors is endless as it is
essential for gaining performance and saving energy. However, TLP in today's programs is …
essential for gaining performance and saving energy. However, TLP in today's programs is …
Using hybrid parallelism to improve memory use in the Uintah framework
The Uintah Software framework was developed to provide an environment for solving fluid-
structure interaction problems on structured adaptive grids on large-scale, long-running …
structure interaction problems on structured adaptive grids on large-scale, long-running …
Fast: A fast stencil autotuning framework based on an optimal-solution space model
Y Luo, G Tan, Z Mo, N Sun - Proceedings of the 29th ACM on …, 2015 - dl.acm.org
Stencil computations comprise an important class of kernels in many scientific computing
applications. As the diversity of both architectures and programming models grow …
applications. As the diversity of both architectures and programming models grow …
Implementation and evaluation of a microthread architecture
K Bousias, L Guang, CR Jesshope… - Journal of Systems …, 2009 - Elsevier
Future many-core processor systems require scalable solutions that conventional
architectures currently do not provide. This paper presents a novel architecture that …
architectures currently do not provide. This paper presents a novel architecture that …