Software-hardware co-design of heterogeneous SmartNIC system for recommendation models inference and training

A Guo, Y Hao, C Wu, P Haghi, Z Pan, M Si… - Proceedings of the 37th …, 2023 - dl.acm.org
Deep Learning Recommendation Models (DLRMs) are important applications in various
domains and have evolved into one of the largest and most important machine learning …

Smartfuse: Reconfigurable smart switches to accelerate fused collectives in hpc applications

P Haghi, C Tan, A Guo, C Wu, D Liu, A Li… - Proceedings of the 38th …, 2024 - dl.acm.org
Communication switches have sometimes been augmented to process collectives, eg, in the
IBM BlueGene and Mellanox SHArP switches. In this work, we find that there is a great …

Novel area-efficient and flexible architectures for optimal Ate pairing on FPGA

O Azzouzi, M Anane, M Koudil, M Issad… - The Journal of …, 2024 - Springer
While FPGA is a suitable platform for implementing cryptographic algorithms, there are
several challenges associated with implementing Optimal Ate pairing on FPGA, such as …

Deep quantization of graph neural networks with run-time hardware-aware training

O Hansson, M Grailoo, O Gustafsson… - … Symposium on Applied …, 2024 - Springer
In this paper, we investigate the benefits of hardware-aware quantization in the gFADES
hardware accelerator targeting Graph Convolutional Networks (GCNs). GCNs are a type of …

A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications

P Haghi, R Marshall, PH Chen, A Skjellum… - arxiv preprint arxiv …, 2023 - arxiv.org
Offload of MPI collectives to network devices, eg, NICs and switches, is being implemented
as an effective mechanism to improve application performance by reducing inter-and intra …

ACiS: smart switches with application-level acceleration

P Haghi - 2023 - search.proquest.com
Network performance has contributed fundamentally to the growth of supercomputing over
the past decades. In parallel, High Performance Computing (HPC) peak performance has …

ACiS: Complex Processing in the Switch Fabric

P Haghi, A Guo, T Geng, A Skjellum… - arxiv preprint arxiv …, 2025 - arxiv.org
For the last three decades a core use of FPGAs has been for processing communication:
FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting …

Flexible communication primitives for diverse deployment scenarios of hardware operating systems for FPGAs

Z Tahir - 2025 - search.proquest.com
Communication capabilities of FPGAs, combined with programmability in hardware
(reconfigurable logic) and software (soft-processors), often provide FPGAs a competitive …

Component design for application-directed FPGA system generation frameworks

SL Bandara - 2024 - search.proquest.com
Abstract Field Programmable Gate Arrays (FPGAs) can fulfill many critical and contrasting
roles in modern computing due to their combination of powerful computing and …

Optimizing the optimizer increasing performance efficiency of modern compilers

H Shahzad - 2025 - search.proquest.com
A long-standing goal, which is increasingly important in the post-Moore era, is to augment
system performance by building more intelligent compilers. One of our motivating …