Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing
The vertex-centric programming model is an established computational paradigm recently
incorporated into distributed processing frameworks to address challenges in large-scale …
incorporated into distributed processing frameworks to address challenges in large-scale …
Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks
The term 'Big Data'has spread rapidly in the framework of Data Mining and Business
Intelligence. This new scenario can be defined by means of those problems that cannot be …
Intelligence. This new scenario can be defined by means of those problems that cannot be …
[BOEK][B] Principles of distributed database systems
MT Özsu, P Valduriez - 1999 - Springer
The first edition of this book appeared in 1991 when the technology was new and there were
not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker …
not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker …
A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud
A large number of cloud services require users to share private data like electronic health
records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via …
records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via …
A survey of large-scale analytical query processing in MapReduce
C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …
information by means of data analysis to support effective decision-making and provide new …
Big data analytics with datalog queries on spark
A Shkapsky, M Yang, M Interlandi, H Chiu… - Proceedings of the …, 2016 - dl.acm.org
There is great interest in exploiting the opportunity provided by cloud computing platforms
for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for …
for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for …
An experimental survey on big data frameworks
Recently, increasingly large amounts of data are generated from a variety of sources.
Existing data processing technologies are not suitable to cope with the huge amounts of …
Existing data processing technologies are not suitable to cope with the huge amounts of …
A comprehensive view of Hadoop research—A systematic literature review
Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale
datasets–known as Big Data–led to the development of solutions to process information …
datasets–known as Big Data–led to the development of solutions to process information …
{ShuffleWatcher}: Shuffle-aware scheduling in multi-tenant {MapReduce} clusters
F Ahmad, ST Chakradhar, A Raghunathan… - 2014 USENIX Annual …, 2014 - usenix.org
MapReduce clusters are usually multi-tenant (ie, shared among multiple users and jobs) for
improving cost and utilization. The performance of jobs in a multi-tenant MapReduce cluster …
improving cost and utilization. The performance of jobs in a multi-tenant MapReduce cluster …
Survey of distributed computing frameworks for supporting big data analysis
Distributed computing frameworks are the fundamental component of distributed computing
systems. They provide an essential way to support the efficient processing of big data on …
systems. They provide an essential way to support the efficient processing of big data on …