Hybrid workload scheduling on HPC systems

Y Fan, Z Lan, P Rich, W Allcock… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Traditionally, on-demand, rigid, and malleable applications have been scheduled and
executed on separate systems. The ever-growing workload demands and rapidly …

Dsparlib: A c++ template library for distributed stream parallelism

J Löff, RB Hoffmann, R Pieper, D Griebler… - International Journal of …, 2022 - Springer
Stream processing applications deal with millions of data items continuously generated over
time. Often, they must be processed in real-time and scale performance, which requires the …

Formal semantics and high performance in declarative machine learning using datalog

J Wang, J Wu, M Li, J Gu, A Das, C Zaniolo - The VLDB Journal, 2021 - Springer
With an escalating arms race to adopt machine learning (ML) in diverse application
domains, there is an urgent need to support declarative machine learning over distributed …

Twister2: Design of a big data toolkit

S Kamburugamuve, K Govindarajan… - Concurrency and …, 2020 - Wiley Online Library
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …

Mllib*: Fast training of glms using spark mllib

Z Zhang, J Jiang, W Wu, C Zhang, L Yu… - 2019 IEEE 35th …, 2019 - computer.org
Abstract In Tencent Inc., more than 80% of the data are extracted and transformed using
Spark. However, the commonly used machine learning systems are TensorFlow, XGBoost …

Model averaging in distributed machine learning: a case study with Apache Spark

Y Guo, Z Zhang, J Jiang, W Wu, C Zhang, B Cui, J Li - The VLDB Journal, 2021 - Springer
The increasing popularity of Apache Spark has attracted many users to put their data into its
ecosystem. On the other hand, it has been witnessed in the literature that Spark is slow …

Cstf: Large-scale sparse tensor factorizations on distributed platforms

Z Blanco, B Liu, MM Dehnavi - … of the 47th international conference on …, 2018 - dl.acm.org
Tensors, or N-dimensional arrays, are increasingly used to represent multi-dimensional
data. Sparse tensor decomposition algorithms are of particular interest in analyzing and …

Toward high-performance computing and big data analytics convergence: The case of spark-diy

S Caino-Lores, J Carretero, B Nicolae, O Yildiz… - IEEE …, 2019 - ieeexplore.ieee.org
Convergence between high-performance computing (HPC) and big data analytics (BDA) is
currently an established research area that has spawned new opportunities for unifying the …

Spark-diy: A framework for interoperable spark operations with high performance block-based data models

S Caíno-Lores, J Carretero, B Nicolae… - 2018 IEEE/ACM 5th …, 2018 - ieeexplore.ieee.org
Today's scientific applications are increasingly relying on a variety of data sources, storage
facilities, and computing infrastructures, and there is a growing demand for data analysis …

Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation

AR Pathak, M Pandey, SS Rautaray - Cluster Computing, 2020 - Springer
The dawn of exascale computing and its convergence with big data analytics has greatly
spurred research interests. The reasons are straightforward. Traditionally, high performance …