A Survey of Design and Optimization for Systolic Array-based DNN Accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
In-memory computing with emerging nonvolatile memory devices
The von Neumann bottleneck and memory wall have posed fundamental limitations in
latency and energy consumption of modern computers based on von Neumann architecture …
latency and energy consumption of modern computers based on von Neumann architecture …
A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology
Many important applications trigger bulk bitwise operations, ie, bitwise operations on large
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …
{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …
is meeting its limits in the face of several recent hardware and application trends. To improve …
Neural cache: Bit-serial in-cache acceleration of deep neural networks
This paper presents the Neural Cache architecture, which re-purposes cache structures to
transform them into massively parallel compute units capable of running inferences for Deep …
transform them into massively parallel compute units capable of running inferences for Deep …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
Drisa: A dram-based reconfigurable in-situ accelerator
Data movement between the processing units and the memory in traditional von Neumann
architecture is creating the" memory wall" problem. To bridge the gap, two approaches, the …
architecture is creating the" memory wall" problem. To bridge the gap, two approaches, the …
Google workloads for consumer devices: Mitigating data movement bottlenecks
We are experiencing an explosive growth in the number of consumer devices, including
smartphones, tablets, web-based computers such as Chromebooks, and wearable devices …
smartphones, tablets, web-based computers such as Chromebooks, and wearable devices …
Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology
The “memory wall” problem or so-called von Neumann bottleneck limits the efficiency of
conventional computer architectures, which move data from memory to CPU for …
conventional computer architectures, which move data from memory to CPU for …