Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs
Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …
building blocks in sparse linear solvers, graph processing frameworks and machine learning …
Haspgemm: Heterogeneity-aware sparse general matrix-matrix multiplication on modern asymmetric multicore processors
Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in
computational science and engineering, and has been widely studied on homogeneous …
computational science and engineering, and has been widely studied on homogeneous …
Amgt: Algebraic multigrid solver on tensor cores
Algebraic multigrid (AMG) methods are particularly efficient to solve a wide range of sparse
linear systems, due to their good flexibility and adaptability. Even though modern parallel …
linear systems, due to their good flexibility and adaptability. Even though modern parallel …
MPI+ ULT: Overlap** communication and computation with user-level threads
As the core density of future processors keeps increasing, MPI+ Threads is becoming a
promising programming model for large scale SMP clusters. Generally speaking, hybrid …
promising programming model for large scale SMP clusters. Generally speaking, hybrid …
Algebraic multigrid domain and range decomposition (AMG-DD/AMG-RD)
In modern large-scale supercomputing applications, algebraic multigrid (AMG) is a leading
choice for solving matrix equations. However, the high cost of communication relative to that …
choice for solving matrix equations. However, the high cost of communication relative to that …
Data-driven performance modeling of linear solvers for sparse matrices
Performance of scientific codes is increasingly dependent on the input problem, its data
representation and the underlying hardware with the increase in code and architectural …
representation and the underlying hardware with the increase in code and architectural …
FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications
Half-precision hardware support is now almost ubiquitous. In contrast to its active use in AI,
half-precision is less commonly employed in scientific and engineering computing. The …
half-precision is less commonly employed in scientific and engineering computing. The …
End-to-end performance modeling of distributed GPU applications
With the growing number of GPU-based supercomputing platforms and GPU-enabled
applications, the ability to accurately model the performance of such applications is …
applications, the ability to accurately model the performance of such applications is …
Improving performance of the hypre iterative solver for Uintah combustion codes on manycore architectures using MPI endpoints and kernel consolidation
The solution of large-scale combustion problems with codes such as the Arches component
of Uintah on next generation computer architectures requires the use of a many and multi …
of Uintah on next generation computer architectures requires the use of a many and multi …
Optimizing the hypre solver for manycore and GPU architectures
The solution of large-scale combustion problems with codes such as Uintah on modern
computer architectures requires the use of multithreading and GPUs to achieve …
computer architectures requires the use of multithreading and GPUs to achieve …