CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture

J Zhuang, J Lau, H Ye, Z Yang, Y Du, J Lo… - Proceedings of the …, 2023 - dl.acm.org
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …

Neural-enhanced live streaming: Improving live video ingest via online learning

J Kim, Y Jung, H Yeo, J Ye, D Han - … of the Annual conference of the …, 2020 - dl.acm.org
Live video accounts for a significant volume of today's Internet video. Despite a large
number of efforts to enhance user quality of experience (QoE) both at the ingest and …

Aeva: Black-box backdoor detection using adversarial extreme value analysis

J Guo, A Li, C Liu - arxiv preprint arxiv:2110.14880, 2021 - arxiv.org
Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A
backdoor is often embedded in the target DNNs through injecting a backdoor trigger into …

Freely scalable and reconfigurable optical hardware for deep learning

L Bernstein, A Sludds, R Hamerly, V Sze, J Emer… - Scientific reports, 2021 - nature.com
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy
and solve more complex problems. This trend has been enabled by an increase in available …

A hardware accelerator for protocol buffers

S Karandikar, C Leary, C Kennelly, J Zhao… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Serialization frameworks are a fundamental component of scale-out systems, but introduce
significant compute overheads. However, they are amenable to acceleration with …

Dsconv: Efficient convolution operator

MG Nascimento, R Fawcett… - Proceedings of the …, 2019 - openaccess.thecvf.com
Quantization is a popular way of increasing the speed and lowering the memory usage of
Convolution Neural Networks (CNNs). When labelled training data is available, network …

SSR: Spatial sequential hybrid architecture for latency throughput tradeoff in transformer acceleration

J Zhuang, Z Yang, S Ji, H Huang, AK Jones… - Proceedings of the …, 2024 - dl.acm.org
With the increase in the computation intensity of the chip, the mismatch between
computation layer shapes and the available computation resource significantly limits the …

Lightning: A reconfigurable photonic-electronic smartnic for fast and energy-efficient inference

Z Zhong, M Yang, J Lang, C Williams… - Proceedings of the …, 2023 - dl.acm.org
The massive growth of machine learning-based applications and the end of Moore's law
have created a pressing need to redesign computing platforms. We propose Lightning, the …

Enabling edge-cloud video analytics for robotics applications

Y Wang, W Wang, D Liu, X **, J Jiang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emerging deep learning-based video analytics tasks demand computation-intensive neural
networks and powerful computing resources on the cloud to achieve high accuracy. Due to …

Compiling KB-sized machine learning models to tiny IoT devices

S Gopinath, N Ghanathe, V Seshadri… - Proceedings of the 40th …, 2019 - dl.acm.org
Recent advances in machine learning (ML) have produced KiloByte-size models that can
directly run on constrained IoT devices. This approach avoids expensive communication …