Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Performance interference of virtual machines: A survey
The rapid development of cloud computing with virtualization technology has benefited both
academia and industry. For any cloud data center at scale, one of the primary challenges is …
academia and industry. For any cloud data center at scale, one of the primary challenges is …
Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …
growing model size. LLMs' size grows by 240× every two years, which outpaces the …
Llumnix: Dynamic scheduling for large language model serving
Inference serving for large language models (LLMs) is the key to unleashing their potential
in people's daily lives. However, efficient LLM serving remains challenging today because …
in people's daily lives. However, efficient LLM serving remains challenging today because …
Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing
As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …
throughput ML inference serving has become critical for online services. Such ML inference …
Gandiva: Introspective cluster scheduling for deep learning
We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …
knowledge to improve latency and efficiency of training deep learning models in a GPU …
Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads
With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …
beginning to incorporate machine learning models across a number of products. These …
Adaptive resource efficient microservice deployment in cloud-edge continuum
User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …
service is built by connecting multiple microservice stages. Since the entire service is heavy …
Faasflow: Enable efficient workflow execution for function-as-a-service
Serverless computing (Function-as-a-Service) provides fine-grain resource sharing by
running functions (or Lambdas) in containers. Data-dependent functions are required to be …
running functions (or Lambdas) in containers. Data-dependent functions are required to be …
{AntMan}: Dynamic scaling on {GPU} clusters for deep learning
Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …
performance, system throughput, and hardware utilization. It is getting ever more …
Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks
S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …
learning patterns of data and are permeating into different industries and markets. Cloud …