[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications
As general-purpose processors have hit the power wall and chip fabrication cost escalates
alarmingly, coarse-grained reconfigurable architectures (CGRAs) are attracting increasing …
alarmingly, coarse-grained reconfigurable architectures (CGRAs) are attracting increasing …
Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …
bandwidth. However, under stringent physical constraints of chip packages and system …
A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology
Many important applications trigger bulk bitwise operations, ie, bitwise operations on large
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …
Pipelayer: A pipelined reram-based accelerator for deep learning
Convolution neural networks (CNNs) are the heart of deep learning applications. Recent
works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access …
works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access …
{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …
is meeting its limits in the face of several recent hardware and application trends. To improve …
Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory
Processing-in-memory (PIM) is a promising solution to address the" memory wall"
challenges for future computer systems. Prior proposed PIM architectures put additional …
challenges for future computer systems. Prior proposed PIM architectures put additional …
Tetris: Scalable and efficient neural network acceleration with 3d memory
The high accuracy of deep neural networks (NNs) has led to the development of NN
accelerators that improve performance by two orders of magnitude. However, scaling these …
accelerators that improve performance by two orders of magnitude. However, scaling these …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …