Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

Making sense of performance in data analytics frameworks

K Ousterhout, R Rasti, S Ratnasamy… - … USENIX Symposium on …, 2015 - usenix.org
There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …

The stratosphere platform for big data analytics

A Alexandrov, R Bergmann, S Ewen, JC Freytag… - The VLDB Journal, 2014 - Springer
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …

Coflow: A networking abstraction for cluster applications

M Chowdhury, I Stoica - Proceedings of the 11th ACM Workshop on Hot …, 2012 - dl.acm.org
Cluster computing applications--frameworks like MapReduce and user-facing applications
like search platforms--have application-level requirements and higher-level abstractions to …

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

Mronline: Mapreduce online performance tuning

M Li, L Zeng, S Meng, J Tan, L Zhang, AR Butt… - Proceedings of the 23rd …, 2014 - dl.acm.org
MapReduce job parameter tuning is a daunting and time consuming task. The parameter
configuration space is huge; there are more than 70 parameters that impact job …

User-defined functions in modern data engines

Y Foufoulas, A Simitsis - 2023 IEEE 39th International …, 2023 - ieeexplore.ieee.org
Modern data management applications involve complex processing tasks over large
volumes of data. Although this falls naturally within the scope of relational databases, many …

Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs

L Fang, K Nguyen, G Xu, B Demsky, S Lu - Proceedings of the 25th …, 2015 - dl.acm.org
Real-world data-parallel programs commonly suffer from great memory pressure, especially
when they are executed to process large datasets. Memory problems lead to excessive GC …

Large scale optimization to minimize network traffic using MapReduce in big data applications

S Neelakandan, S Divyabharathi… - … on Computation of …, 2016 - ieeexplore.ieee.org
The Map-Reduce model simplifies the large scale data handling on commodities group by
abusing parallel map & reduces assignments.. The use of this model is beneficial only when …

{HadoopProv}: Towards Provenance as a First Class Citizen in {MapReduce}

S Akoush, R Sohan, A Hopper - 5th USENIX Workshop on the Theory …, 2013 - usenix.org
We introduce HadoopProv, a modified version of Hadoop that implements provenance
capture and analysis in MapReduce jobs. It is designed to minimise provenance capture …