Adatune: Adaptive tensor program compilation made efficient

M Li, M Zhang, C Wang, M Li - Advances in Neural …, 2020 - proceedings.neurips.cc
Deep learning models are computationally intense, and implementations often have to be
highly optimized by experts or hardware vendors to be usable in practice. The DL compiler …

Swift machine learning model serving scheduling: a region based reinforcement learning approach

H Qin, S Zawad, Y Zhou, L Yang, D Zhao… - Proceedings of the …, 2019 - dl.acm.org
The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS)-
deploying trained machine learning (ML) models in cloud to provide low latency inference …

Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving

J Yu, J Kim, E Seo - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org
The proportion of machine learning (ML) inference in modern cloud workloads is rapidly
increasing, and graphic processing units (GPUs) are the most preferred computational …

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

J Zhuang, Z Yang, S Ji, H Huang, AK Jones… - Proceedings of the …, 2024 - dl.acm.org
With the increase in the computation intensity of the chip, the mismatch between
computation layer shapes and the available computation resource significantly limits the …

Perseus: Characterizing performance and cost of multi-tenant serving for cnn models

M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …

Reinforcement-learning-empowered MLaaS scheduling for serving intelligent internet of things

H Qin, S Zawad, Y Zhou, S Padhi… - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org
Machine learning (ML) has been embedded in many Internet of Things (IoT) applications
(eg, smart home and autonomous driving). Yet it is often infeasible to deploy ML models on …

Parax: Boosting deep learning for big data analytics on many-core cpus

L Yin, Y Zhang, Z Zhang, Y Peng, P Zhao - Proceedings of the VLDB …, 2021 - dl.acm.org
Despite the fact that GPUs and accelerators are more efficient in deep learning (DL),
commercial clouds like Facebook and Amazon now heavily use CPUs in DL computation …

FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach

K Inayat, FB Muslim, T Mahmood, J Chung - Journal of Systems …, 2024 - Elsevier
FPGAs facilitate prototy** and debug, and recently accelerate full-stack simulations due to
their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space …

Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models

AD Gotmare, J Li, S Joty, SCH Hoi - Proceedings of the 31st ACM Joint …, 2023 - dl.acm.org
The goal of semantic code search or text-to-code search is to retrieve a semantically
relevant code snippet from an existing code database using a natural language query …

Programming Abstractions & Systems for Autonomous Vehicles

S Kalra - 2024 - search.proquest.com
Abstract Autonomous Vehicles (AVs) have the potential to revolutionize transportation
through their significant safety, environmental and mobility benefits. However, despite their …