Kubernetes scheduling: Taxonomy, ongoing issues and challenges
C Carrión - ACM Computing Surveys, 2022 - dl.acm.org
Continuous integration enables the development of microservices-based applications using
container virtualization technology. Container orchestration systems such as Kubernetes …
container virtualization technology. Container orchestration systems such as Kubernetes …
[HTML][HTML] Deep neural networks in the cloud: Review, applications, challenges and research directions
Deep neural networks (DNNs) are currently being deployed as machine learning technology
in a wide range of important real-world applications. DNNs consist of a huge number of …
in a wide range of important real-world applications. DNNs consist of a huge number of …
Netllm: Adapting large language models for networking
Many networking tasks now employ deep learning (DL) to solve complex prediction and
optimization problems. However, current design philosophy of DL-based algorithms entails …
optimization problems. However, current design philosophy of DL-based algorithms entails …
A survey of Kubernetes scheduling algorithms
As cloud services expand, the need to improve the performance of data center infrastructure
becomes more important. High-performance computing, advanced networking solutions …
becomes more important. High-performance computing, advanced networking solutions …
Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
Preemptive all-reduce scheduling for expediting distributed DNN training
Data-parallel training is widely used for scaling DNN training over large datasets, using the
parameter server or all-reduce architecture. Communication scheduling has been promising …
parameter server or all-reduce architecture. Communication scheduling has been promising …
Cluster resource scheduling in cloud computing: literature review and research challenges
Scheduling plays a pivotal role in cloud computing systems. Designing an efficient
scheduler is a challenging task. The challenge comes from several aspects, including the …
scheduler is a challenging task. The challenge comes from several aspects, including the …
AI-based resource management in beyond 5G cloud native environment
5G system and beyond targets a large number of emerging applications and services that
will create extra overhead on network traffic. These industrial verticals have aggressive …
will create extra overhead on network traffic. These industrial verticals have aggressive …
Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs
While recent deep learning workload schedulers exhibit excellent performance, it is arduous
to deploy them in practice due to some substantial defects, including inflexible intrusive …
to deploy them in practice due to some substantial defects, including inflexible intrusive …