A survey on deep learning hardware accelerators for heterogeneous hpc platforms
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …
solution for several classes of high-performance computing (HPC) applications such as …
Agile SoC development with open ESP
ESP is an open-source research platform for heterogeneous SoC design. The platform
combines a modular tile-based architecture with a variety of application-oriented flows for …
combines a modular tile-based architecture with a variety of application-oriented flows for …
The nanopu: A nanosecond network stack for datacenters
We present the nanoPU, a new NIC-CPU co-design to accelerate an increasingly pervasive
class of datacenter applications: those that utilize many small Remote Procedure Calls …
class of datacenter applications: those that utilize many small Remote Procedure Calls …
Manticore: A 4096-core RISC-V chiplet architecture for ultraefficient floating-point computing
Data-parallel problems demand ever growing floating-point (FP) operations per second
under tight area-and energy-efficiency constraints. In this work, we present Manticore, a …
under tight area-and energy-efficiency constraints. In this work, we present Manticore, a …
Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …
performance targets at manageable power. However, memory latency bottlenecks remain …
BYOC: a" bring your own core" framework for heterogeneous-ISA research
Heterogeneous architectures and heterogeneous-ISA designs are growing areas of
computer architecture and system software research. Unfortunately, this line of research is …
computer architecture and system software research. Unfortunately, this line of research is …
Dalorex: A data-local program execution and architecture for memory-bound applications
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …
An open-source platform for high-performance non-coherent on-chip communication
On-chip communication infrastructure is a central component of modern systems-on-chip
(SoCs), and it continues to gain importance as the number of cores, the heterogeneity of …
(SoCs), and it continues to gain importance as the number of cores, the heterogeneity of …
Muchisim: A simulation framework for design exploration of multi-chip manycore systems
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
PXNOR-BNN: In/with spin-orbit torque MRAM preset-XNOR operation-based binary neural networks
Convolution neural networks (CNNs) have demonstrated superior capability in computer
vision, speech recognition, autonomous driving, and so forth, which are opening up an …
vision, speech recognition, autonomous driving, and so forth, which are opening up an …