Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
Making sense of performance in data analytics frameworks
There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …
frameworks, but comparatively little effort has been spent systematically identifying the …
The stratosphere platform for big data analytics
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …
Stratosphere brings together a unique set of features that allow the expressive, easy, and …
Coflow: A networking abstraction for cluster applications
Cluster computing applications--frameworks like MapReduce and user-facing applications
like search platforms--have application-level requirements and higher-level abstractions to …
like search platforms--have application-level requirements and higher-level abstractions to …
A survey of large-scale analytical query processing in MapReduce
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …
information by means of data analysis to support effective decision-making and provide new …
Mronline: Mapreduce online performance tuning
MapReduce job parameter tuning is a daunting and time consuming task. The parameter
configuration space is huge; there are more than 70 parameters that impact job …
configuration space is huge; there are more than 70 parameters that impact job …
User-defined functions in modern data engines
Modern data management applications involve complex processing tasks over large
volumes of data. Although this falls naturally within the scope of relational databases, many …
volumes of data. Although this falls naturally within the scope of relational databases, many …
Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs
Real-world data-parallel programs commonly suffer from great memory pressure, especially
when they are executed to process large datasets. Memory problems lead to excessive GC …
when they are executed to process large datasets. Memory problems lead to excessive GC …
Large scale optimization to minimize network traffic using MapReduce in big data applications
S Neelakandan, S Divyabharathi… - … on Computation of …, 2016 - ieeexplore.ieee.org
The Map-Reduce model simplifies the large scale data handling on commodities group by
abusing parallel map & reduces assignments.. The use of this model is beneficial only when …
abusing parallel map & reduces assignments.. The use of this model is beneficial only when …
{HadoopProv}: Towards Provenance as a First Class Citizen in {MapReduce}
We introduce HadoopProv, a modified version of Hadoop that implements provenance
capture and analysis in MapReduce jobs. It is designed to minimise provenance capture …
capture and analysis in MapReduce jobs. It is designed to minimise provenance capture …