Hybrid workload scheduling on HPC systems
Traditionally, on-demand, rigid, and malleable applications have been scheduled and
executed on separate systems. The ever-growing workload demands and rapidly …
executed on separate systems. The ever-growing workload demands and rapidly …
Dsparlib: A c++ template library for distributed stream parallelism
Stream processing applications deal with millions of data items continuously generated over
time. Often, they must be processed in real-time and scale performance, which requires the …
time. Often, they must be processed in real-time and scale performance, which requires the …
Formal semantics and high performance in declarative machine learning using datalog
With an escalating arms race to adopt machine learning (ML) in diverse application
domains, there is an urgent need to support declarative machine learning over distributed …
domains, there is an urgent need to support declarative machine learning over distributed …
Twister2: Design of a big data toolkit
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
Mllib*: Fast training of glms using spark mllib
Abstract In Tencent Inc., more than 80% of the data are extracted and transformed using
Spark. However, the commonly used machine learning systems are TensorFlow, XGBoost …
Spark. However, the commonly used machine learning systems are TensorFlow, XGBoost …
Model averaging in distributed machine learning: a case study with Apache Spark
The increasing popularity of Apache Spark has attracted many users to put their data into its
ecosystem. On the other hand, it has been witnessed in the literature that Spark is slow …
ecosystem. On the other hand, it has been witnessed in the literature that Spark is slow …
Cstf: Large-scale sparse tensor factorizations on distributed platforms
Tensors, or N-dimensional arrays, are increasingly used to represent multi-dimensional
data. Sparse tensor decomposition algorithms are of particular interest in analyzing and …
data. Sparse tensor decomposition algorithms are of particular interest in analyzing and …
Toward high-performance computing and big data analytics convergence: The case of spark-diy
Convergence between high-performance computing (HPC) and big data analytics (BDA) is
currently an established research area that has spawned new opportunities for unifying the …
currently an established research area that has spawned new opportunities for unifying the …
Spark-diy: A framework for interoperable spark operations with high performance block-based data models
Today's scientific applications are increasingly relying on a variety of data sources, storage
facilities, and computing infrastructures, and there is a growing demand for data analysis …
facilities, and computing infrastructures, and there is a growing demand for data analysis …
Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation
The dawn of exascale computing and its convergence with big data analytics has greatly
spurred research interests. The reasons are straightforward. Traditionally, high performance …
spurred research interests. The reasons are straightforward. Traditionally, high performance …