A Survey of Design and Optimization for Systolic Array-based DNN Accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
Stonne: Enabling cycle-level microarchitectural simulation for dnn inference accelerators
The design of specialized architectures for accelerating the inference procedure of Deep
Neural Networks (DNNs) is a booming area of research nowadays. While first-generation …
Neural Networks (DNNs) is a booming area of research nowadays. While first-generation …
Mtia: First generation silicon targeting meta's recommendation systems
Meta has traditionally relied on using CPU-based servers for running inference workloads,
specifically Deep Learning Recommendation Models (DLRM), but the increasing compute …
specifically Deep Learning Recommendation Models (DLRM), but the increasing compute …
Flat: An optimized dataflow for mitigating attention bottlenecks
Attention mechanisms, primarily designed to capture pairwise correlations between words,
have become the backbone of machine learning, expanding beyond natural language …
have become the backbone of machine learning, expanding beyond natural language …
Xel: A cloud-agnostic data platform for the design-driven building of high-availability data science services
This paper presents Xel, a cloud-agnostic data platform for the design-driven building of
high-availability data science services as a support tool for data-driven decision-making. We …
high-availability data science services as a support tool for data-driven decision-making. We …
LIBRA: Enabling Workload-Aware Multi-Dimensional Network Topology Optimization for Distributed Training of Large AI Models
As model sizes in machine learning continue to scale, distributed training is necessary to
accommodate model weights within each device and to reduce training time. However, this …
accommodate model weights within each device and to reduce training time. However, this …
STIFT: A spatio-temporal integrated folding tree for efficient reductions in flexible DNN accelerators
Increasing deployment of Deep Neural Networks (DNNs) recently fueled interest in the
development of specific accelerator architectures capable of meeting their stringent …
development of specific accelerator architectures capable of meeting their stringent …
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
Reconfigurable accelerators for deep neural networks (DNNs) promise to improve
performance such as inference latency. STONNE is the first cycle-accurate simulator for …
performance such as inference latency. STONNE is the first cycle-accurate simulator for …
Neural-Network-Assisted Packet Accelerators for Internet of Things Network Systems
Major device nodes within the Internet of Things (IoT) system collects and store information
in bit forms of 0's and 1's regardless of its repetition. The nodes do not possess the capability …
in bit forms of 0's and 1's regardless of its repetition. The nodes do not possess the capability …
Multi-channel medium access control protocols for wireless networks within computing packages
Wireless communications at the chip scale emerge as a interesting complement to
traditional wire-based approaches thanks to their low latency, inherent broadcast nature …
traditional wire-based approaches thanks to their low latency, inherent broadcast nature …