Adatune: Adaptive tensor program compilation made efficient
Deep learning models are computationally intense, and implementations often have to be
highly optimized by experts or hardware vendors to be usable in practice. The DL compiler …
highly optimized by experts or hardware vendors to be usable in practice. The DL compiler …
Swift machine learning model serving scheduling: a region based reinforcement learning approach
The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS)-
deploying trained machine learning (ML) models in cloud to provide low latency inference …
deploying trained machine learning (ML) models in cloud to provide low latency inference …
Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving
J Yu, J Kim, E Seo - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org
The proportion of machine learning (ML) inference in modern cloud workloads is rapidly
increasing, and graphic processing units (GPUs) are the most preferred computational …
increasing, and graphic processing units (GPUs) are the most preferred computational …
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
With the increase in the computation intensity of the chip, the mismatch between
computation layer shapes and the available computation resource significantly limits the …
computation layer shapes and the available computation resource significantly limits the …
Perseus: Characterizing performance and cost of multi-tenant serving for cnn models
M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …
novel features such as facial recognition, and traditional features, eg web search. To …
Reinforcement-learning-empowered MLaaS scheduling for serving intelligent internet of things
Machine learning (ML) has been embedded in many Internet of Things (IoT) applications
(eg, smart home and autonomous driving). Yet it is often infeasible to deploy ML models on …
(eg, smart home and autonomous driving). Yet it is often infeasible to deploy ML models on …
Parax: Boosting deep learning for big data analytics on many-core cpus
Despite the fact that GPUs and accelerators are more efficient in deep learning (DL),
commercial clouds like Facebook and Amazon now heavily use CPUs in DL computation …
commercial clouds like Facebook and Amazon now heavily use CPUs in DL computation …
FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach
K Inayat, FB Muslim, T Mahmood, J Chung - Journal of Systems …, 2024 - Elsevier
FPGAs facilitate prototy** and debug, and recently accelerate full-stack simulations due to
their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space …
their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space …
Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models
The goal of semantic code search or text-to-code search is to retrieve a semantically
relevant code snippet from an existing code database using a natural language query …
relevant code snippet from an existing code database using a natural language query …
Programming Abstractions & Systems for Autonomous Vehicles
S Kalra - 2024 - search.proquest.com
Abstract Autonomous Vehicles (AVs) have the potential to revolutionize transportation
through their significant safety, environmental and mobility benefits. However, despite their …
through their significant safety, environmental and mobility benefits. However, despite their …