Pipeline-based linear scheduling of big data streams in the cloud

N Tantalaki, S Souravlas, M Roumeliotis… - IEEE …, 2020 - ieeexplore.ieee.org
Nowadays, there is an accelerating need to efficiently and timely handle large amounts of
data that arrives continuously. Streams of big data led to the emergence of several …

An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters

H Hadian, M Farrokh, M Sharifi, A Jafari - The Journal of Supercomputing, 2023 - Springer
Abstract Existing Data Stream Processing (DSP) systems perform poorly while encountering
heavy workloads, particularly on clustered set of (heterogeneous) computers. Elasticity and …

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …

Job scheduler for streaming applications in heterogeneous distributed processing systems

A Al-Sinayyid, M Zhu - The Journal of Supercomputing, 2020 - Springer
In this study, we investigated the problem of scheduling streaming applications on a
heterogeneous cluster environment and, based on our previous work, developed the …

Orchestrating scheduling, grou** and parallelism to enhance the performance of distributed stream computing system

D Sun, H Chen, S Gao, R Buyya - Expert Systems with Applications, 2024 - Elsevier
In a big data stream computing environment, the arrival rate of data streams usually
fluctuates over time, posing a great challenge to the elasticity of system. The performance of …

Linear scheduling of big data streams on multiprocessor sets in the cloud

N Tantalaki, S Souravlas, M Roumeliotis… - IEEE/WIC/ACM …, 2019 - dl.acm.org
Nowadays, there is an accelerating need to efficiently and timely handle large amounts of
data that arrives continuously. Streams of big data led to the emergence of Distributed …

Trident: task scheduling over tiered storage systems in big data platforms

H Herodotou, E Kakoulli - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
The recent advancements in storage technologies have popularized the use of tiered
storage systems in data-intensive compute clusters. The Hadoop Distributed File System …

Smart-mDAG: An intelligent scheduling method for multi-DAG jobs

Y Zhu, B Hu - 2021 International Conference on Information …, 2021 - ieeexplore.ieee.org
Job scheduling is a fundamental problem in cloud data center, which plays an essential role
in the makespan, resource utilization and maintenance of scheduling security, it has …

A topology-aware scheduling strategy for distributed stream computing system

B Li, D Sun, VL Chau, R Buyya - … 2021, Virtual Event, October 28–29 …, 2022 - Springer
Reducing latency has become the focus of task scheduling research in distributed big data
stream computing systems. Currently, most task schedulers in big data stream computing …

A Comparative Study of Spark Schedulers' Performance

A Raju, R Ramanathan… - 2019 4th international …, 2019 - ieeexplore.ieee.org
Big data applications have become an integral part of many intelligent systems, enabling
better business decision making, by extracting useful information from historical data. This …