AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …
video, and audio. In many applications, however, information and their relationships are …
A survey on deep learning hardware accelerators for heterogeneous hpc platforms
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …
solution for several classes of high-performance computing (HPC) applications such as …
A survey of accelerating parallel sparse linear algebra
Sparse linear algebra includes the fundamental and important operations in various large-
scale scientific computing and real-world applications. There exists performance bottleneck …
scale scientific computing and real-world applications. There exists performance bottleneck …
Identifying surface-enhanced raman spectra with a raman library using machine learning
Since its discovery, surface-enhanced Raman spectroscopy (SERS) has shown outstanding
promise of identifying trace amounts of unknown molecules in rapid, portable formats …
promise of identifying trace amounts of unknown molecules in rapid, portable formats …
A new technique to incorporate multiple fermion flavors in tensor renormalization group method for lattice gauge theories
A Yosprakob, J Nishimura, K Okunishi - Journal of High Energy Physics, 2023 - Springer
A bstract We propose a new technique to incorporate multiple fermion flavors in the tensor
renormalization group method for lattice gauge theories, where fermions are treated by the …
renormalization group method for lattice gauge theories, where fermions are treated by the …
Sparse spiking neural-like membrane systems on graphics processing units
J Hernández-Tello, MÁ Martínez-del-Amor… - arxiv preprint arxiv …, 2024 - arxiv.org
The parallel simulation of Spiking Neural P systems is mainly based on a matrix
representation, where the graph inherent to the neural model is encoded in an adjacency …
representation, where the graph inherent to the neural model is encoded in an adjacency …
Accelerating large sparse neural network inference using GPU task graph parallelism
The ever-increasing size of modern deep neural network (DNN) architectures has put
increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly …
increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly …
Haspgemm: Heterogeneity-aware sparse general matrix-matrix multiplication on modern asymmetric multicore processors
Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in
computational science and engineering, and has been widely studied on homogeneous …
computational science and engineering, and has been widely studied on homogeneous …
Dedicated hardware accelerators for processing of sparse matrices and vectors: A survey
V Isaac–Chassande, A Evans, Y Durand… - ACM Transactions on …, 2024 - dl.acm.org
Performance in scientific and engineering applications such as computational physics,
algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the …
algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the …
A tensor marshaling unit for sparse tensor algebra on general-purpose processors
This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow
engine for multicore architectures that accelerates tensor traversals and merging, the most …
engine for multicore architectures that accelerates tensor traversals and merging, the most …