Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A systematic survey of general sparse matrix-matrix multiplication
General Sparse Matrix-Matrix Multiplication (SpGEMM) has attracted much attention from
researchers in graph analyzing, scientific computing, and deep learning. Many optimization …
researchers in graph analyzing, scientific computing, and deep learning. Many optimization …
TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs
Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …
building blocks in sparse linear solvers, graph processing frameworks and machine learning …
Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete
systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial …
systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial …
Space efficient sequence alignment for sram-based computing: X-drop on the graphcore IPU
Dedicated accelerator hardware has become essential for processing AI-based workloads,
leading to the rise of novel accelerator architectures. Furthermore, fundamental differences …
leading to the rise of novel accelerator architectures. Furthermore, fundamental differences …
A tensor marshaling unit for sparse tensor algebra on general-purpose processors
This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow
engine for multicore architectures that accelerates tensor traversals and merging, the most …
engine for multicore architectures that accelerates tensor traversals and merging, the most …
Distributed-memory parallel contig generation for de novo long-read genome assembly
De novo genome assembly, ie, rebuilding the sequence of an unknown genome from
redundant and erroneous short sequences, is a key but computationally intensive step in …
redundant and erroneous short sequences, is a key but computationally intensive step in …
A novel method for temporal graph classification based on transitive reduction
Domains such as bio-informatics, social network analysis, and computer vision, describe
relations between entities and cannot be interpreted as vectors or fixed grids, instead, they …
relations between entities and cannot be interpreted as vectors or fixed grids, instead, they …
Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
Sparse GEneral Matrix-matrix Multiplication (SpGEMM) is a critical operation in many
applications. Current multithreaded implementations are based on Gustavson's algorithm …
applications. Current multithreaded implementations are based on Gustavson's algorithm …
High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid Parallelism
In generating large quantities of DNA data, high-throughput sequencing technologies
require advanced bioinformatics infrastructures for efficient data analysis. k-mer counting …
require advanced bioinformatics infrastructures for efficient data analysis. k-mer counting …
Map applications to target exascale architecture with machine-specific performance analysis, including challenges and projections
This Exascale Computing Project (ECP) milestone report summarizes the status of all 30
ECP Applications Development (AD) subprojects at the end of FY20. In October and …
ECP Applications Development (AD) subprojects at the end of FY20. In October and …