A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Autorep: Automatic relu replacement for fast private network inference

H Peng, S Huang, T Zhou, Y Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients'
data privacy and security issues. Private inference (PI) techniques using cryptographic …

Lingcn: Structural linearized graph convolutional network for homomorphically encrypted inference

H Peng, R Ran, Y Luo, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc
Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized
numerous applications, surpassing human performance in areas such as personal …

Ising-traffic: Using ising machine learning to predict traffic congestion under uncertainty

Z Pan, A Sharma, JYC Hu, Z Liu, A Li, H Liu… - Proceedings of the …, 2023 - ojs.aaai.org
This paper addresses the challenges in accurate and real-time traffic congestion prediction
under uncertainty by proposing Ising-Traffic, a dual-model Ising-based traffic prediction …

Accel-gcn: High-performance gpu accelerator design for graph convolution networks

X **e, H Peng, A Hasan, S Huang… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph
data across various domains, yet their acceleration on mainstream GPUs is challenged by …

Towards sparsification of graph neural networks

H Peng, D Gurevin, S Huang, T Geng… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org
As real-world graphs expand in size, larger GNN models with billions of parameters are
deployed. High parameter count in such models makes training and inference on graphs …

Efficient lung cancer image classification and segmentation algorithm based on an improved swin transformer

R Sun, Y Pang, W Li - Electronics, 2023 - mdpi.com
With the advancement of computer technology, transformer models have been applied to the
field of computer vision (CV) after their success in natural language processing (NLP). In …

Understanding the potential of fpga-based spatial acceleration for large language model inference

H Chen, J Zhang, Y Du, S **ang, Z Yue… - ACM Transactions on …, 2024 - dl.acm.org
Recent advancements in large language models (LLMs) boasting billions of parameters
have generated a significant demand for efficient deployment in inference workloads. While …

[PDF][PDF] MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

H Peng, X **e, K Shivdikar, MD Hasan… - arxiv preprint arxiv …, 2023 - wiki.kaustubh.us
In the acceleration of deep neural network training, the graphics processing unit (GPU) has
become the mainstream platform. GPUs face substantial challenges on Graph Neural …

Rrnet: Towards relu-reduced neural network for two-party computation based private inference

H Peng, S Zhou, Y Luo, N Xu, S Duan, R Ran… - arxiv preprint arxiv …, 2023 - arxiv.org
The proliferation of deep learning (DL) has led to the emergence of privacy and security
concerns. To address these issues, secure Two-party computation (2PC) has been …