The family of mapreduce and large-scale data processing systems

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013‏ - dl.acm.org
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

Parallel processing systems for big data: a survey

Y Zhang, T Cao, S Li, X Tian, L Yuan… - Proceedings of the …, 2016‏ - ieeexplore.ieee.org
The volume, variety, and velocity properties of big data and the valuable information it
contains have motivated the investigation of many new parallel data processing systems in …

The stratosphere platform for big data analytics

A Alexandrov, R Bergmann, S Ewen, JC Freytag… - The VLDB Journal, 2014‏ - Springer
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …

Elastic scaling for data stream processing

B Gedik, S Schneider, M Hirzel… - IEEE Transactions on …, 2013‏ - ieeexplore.ieee.org
This article addresses the profitability problem associated with auto-parallelization of
general-purpose distributed data stream processing applications. Auto-parallelization …

Auto-scaling techniques for elastic data stream processing

T Heinze, V Pappalardo, Z Jerzak… - Proceedings of the 8th …, 2014‏ - dl.acm.org
Typical use cases like financial trading or monitoring of manufacturing equipment pose huge
challenges regarding end to end latency as well as throughput towards existing data stream …

The DEBS 2012 grand challenge

Z Jerzak, T Heinze, M Fehr, D Gröber… - Proceedings of the 6th …, 2012‏ - dl.acm.org
The goal of the DEBS Grand Challenge series is to contribute to the Event Processing Grand
Challenge, that serves as a common goal and mechanism for coordinating research …

Pricing approaches for data markets

A Muschalle, F Stahl, A Löser, G Vossen - … , BIRTE 2012, Held at the 38th …, 2013‏ - Springer
Currently, multiple data vendors utilize the cloud-computing paradigm for trading raw data,
associated analytical services, and analytic results as a commodity good. We observe that …

From conceptual design to performance optimization of ETL workflows: current state of research and open problems

SMF Ali, R Wrembel - The VLDB Journal, 2017‏ - Springer
In this paper, we discuss the state of the art and current trends in designing and optimizing
ETL workflows. We explain the existing techniques for:(1) constructing a conceptual and a …

A survey on vertical and horizontal scaling platforms for big data analytics

AH Ali - International Journal of Integrated Engineering, 2019‏ - publisher.uthm.edu.my
There is no doubt that we are entering the era of big data. The challenge is on how to store,
search, and analyze the huge amount of data that is being generated per second. One of the …

Big data 2.0 processing systems: Taxonomy and open challenges

F Bajaber, R Elshawi, O Batarfi, A Altalhi… - Journal of Grid …, 2016‏ - Springer
Data is key resource in the modern world. Big data has become a popular term which is
used to describe the exponential growth and availability of data. In practice, the growing …