Kubernetes scheduling: Taxonomy, ongoing issues and challenges

C Carrión - ACM Computing Surveys, 2022 - dl.acm.org
Continuous integration enables the development of microservices-based applications using
container virtualization technology. Container orchestration systems such as Kubernetes …

Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Automatic policy generation for {Inter-Service} access control of microservices

X Li, Y Chen, Z Lin, X Wang, JH Chen - 30th USENIX Security …, 2021 - usenix.org
Cloud applications today are often composed of many microservices. To prevent a
microservice from being abused by other (compromised) microservices, inter-service access …

[HTML][HTML] Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture

N Janbi, I Katib, R Mehmood - Intelligent Systems with Applications, 2023 - Elsevier
Artificial intelligence (AI) research and market have grown rapidly in the last few years, and
this trend is expected to continue with many potential advancements and innovations in this …

Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters

Z Bian, S Li, W Wang, Y You - … of the International Conference for High …, 2021 - dl.acm.org
Efficient GPU resource scheduling is essential to maximize resource utilization and save
training costs for the increasing amount of deep learning workloads in shared GPU clusters …

On a Meta Learning-Based Scheduler for Deep Learning Clusters

J Yang, L Bao, W Liu, R Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning (DL) has become a dominating type of workloads on AI computing platforms.
The performance of such platforms highly depends on how distributed DL jobs are …

[PDF][PDF] DynamoML: Dynamic Resource Management Operators for Machine Learning Workloads.

MC Chiang, J Chou - CLOSER, 2021 - scitepress.org
The recent success of deep learning applications is driven by the computing power of GPUs.
However, as the workflow of deep learning becomes increasingly complicated and resource …

Distributed artificial intelligence: review, taxonomy, framework, and reference architecture

N Janbi, I Katib, R Mehmood - Taxonomy, Framework, and …, 2023 - papers.ssrn.com
Artificial intelligence (AI) research and market have grown rapidly in the last few years and
this trend is expected to continue with many potential advancements and innovations in this …