A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …

Agile SoC development with open ESP

P Mantovani, D Giri, G Di Guglielmo… - Proceedings of the 39th …, 2020 - dl.acm.org
ESP is an open-source research platform for heterogeneous SoC design. The platform
combines a modular tile-based architecture with a variety of application-oriented flows for …

The nanopu: A nanosecond network stack for datacenters

S Ibanez, A Mallery, S Arslan, T Jepsen… - … on Operating Systems …, 2021 - usenix.org
We present the nanoPU, a new NIC-CPU co-design to accelerate an increasingly pervasive
class of datacenter applications: those that utilize many small Remote Procedure Calls …

Manticore: A 4096-core RISC-V chiplet architecture for ultraefficient floating-point computing

F Zaruba, F Schuiki, L Benini - IEEE Micro, 2020 - ieeexplore.ieee.org
Data-parallel problems demand ever growing floating-point (FP) operations per second
under tight area-and energy-efficiency constraints. In this work, we present Manticore, a …

Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs

M Orenes-Vera, A Manocha, J Balkind, F Gao… - Proceedings of the 49th …, 2022 - dl.acm.org
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …

BYOC: a" bring your own core" framework for heterogeneous-ISA research

J Balkind, K Lim, M Schaffner, F Gao… - Proceedings of the …, 2020 - dl.acm.org
Heterogeneous architectures and heterogeneous-ISA designs are growing areas of
computer architecture and system software research. Unfortunately, this line of research is …

Dalorex: A data-local program execution and architecture for memory-bound applications

M Orenes-Vera, E Tureci, D Wentzlaff… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Applications with low data reuse and frequent irregular memory accesses, such as graph or
sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core …

An open-source platform for high-performance non-coherent on-chip communication

A Kurth, W Rönninger, T Benz… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
On-chip communication infrastructure is a central component of modern systems-on-chip
(SoCs), and it continues to gain importance as the number of cores, the heterogeneity of …

Muchisim: A simulation framework for design exploration of multi-chip manycore systems

M Orenes-Vera, E Tureci, M Martonosi… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …

PXNOR-BNN: In/with spin-orbit torque MRAM preset-XNOR operation-based binary neural networks

L Chang, X Ma, Z Wang, Y Zhang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Convolution neural networks (CNNs) have demonstrated superior capability in computer
vision, speech recognition, autonomous driving, and so forth, which are opening up an …