I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization
Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three
years. Compared with other deep learning modalities, high-performance hardware …
years. Compared with other deep learning modalities, high-performance hardware …
Sisa: Set-centric instruction set architecture for graph mining on processing-in-memory systems
Simple graph algorithms such as PageRank have been the target of numerous hardware
accelerators. Yet, there also exist much more complex graph mining algorithms for problems …
accelerators. Yet, there also exist much more complex graph mining algorithms for problems …
Graphit: A high-performance graph dsl
The performance bottlenecks of graph applications depend not only on the algorithm and
the underlying hardware, but also on the size and structure of the input graph. As a result …
the underlying hardware, but also on the size and structure of the input graph. As a result …
Exploiting locality in graph analytics through hardware-accelerated traversal scheduling
Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches
are of little help because the irregular structure of graphs causes seemingly random memory …
are of little help because the irregular structure of graphs causes seemingly random memory …
Featgraph: A flexible and efficient backend for graph neural network systems
Graph neural networks (GNNs) are gaining popularity as a promising approach to machine
learning on graphs. Unlike traditional graph workloads where each vertex/edge is …
learning on graphs. Unlike traditional graph workloads where each vertex/edge is …
GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs
Graph processing is typically memory bound due to low compute to memory access ratio
and irregular data access pattern. The emerging high-bandwidth memory (HBM) delivers …
and irregular data access pattern. The emerging high-bandwidth memory (HBM) delivers …
Terrace: A hierarchical graph container for skewed dynamic graphs
Various applications model problems as streaming graphs, which need to quickly apply a
stream of updates and run algorithms on the updated graph. Furthermore, many dynamic …
stream of updates and run algorithms on the updated graph. Furthermore, many dynamic …
PHI: Architectural support for synchronization-and bandwidth-efficient commutative scatter updates
Many applications perform frequent scatter update operations to large data structures. For
example, in push-style graph algorithms, processing each vertex requires updating the data …
example, in push-style graph algorithms, processing each vertex requires updating the data …
When is graph reordering an optimization? studying the effect of lightweight graph reordering across applications and input graphs
Graph processing applications are notorious for exhibiting poor cache locality due to an
irregular memory access pattern. However, prior work on graph reordering has observed …
irregular memory access pattern. However, prior work on graph reordering has observed …
Traversing large graphs on GPUs with unified memory
Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …