Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing

M Zaharia, M Chowdhury, T Das, A Dave, J Ma… - 9th USENIX symposium …, 2012 - usenix.org
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets
programmers perform in-memory computations on large clusters in a fault-tolerant manner …

[PDF][PDF] Mesos: A platform for {Fine-Grained} resource sharing in the data center

B Hindman, A Konwinski, M Zaharia, A Ghodsi… - … USENIX Symposium on …, 2011 - usenix.org
We present Mesos, a platform for sharing commodity clusters between multiple diverse
cluster computing frameworks, such as Hadoop and MPI. Sharing improves cluster …

[PDF][PDF] Reining in the outliers in {Map-Reduce} clusters using mantri

G Ananthanarayanan, S Kandula… - … USENIX Symposium on …, 2010 - usenix.org
Experience from an operational Map-Reduce cluster reveals that outliers significantly
prolong job completion. e causes for outliers include run-time contention for processor …

Resource provisioning framework for mapreduce jobs with performance goals

A Verma, L Cherkasova, RH Campbell - Middleware 2011: ACM/IFIP …, 2011 - Springer
Many companies are increasingly using MapReduce for efficient large scale data
processing such as personalized advertising, spam detection, and different data mining …

[PDF][PDF] See spot run: Using spot instances for {MapReduce} workflows

N Chohan, C Castillo, M Spreitzer, M Steinder… - 2nd USENIX Workshop …, 2010 - usenix.org
MapReduce is a scalable and fault tolerant framework, patented by Google, for computing
embarrassingly parallel reductions. Hadoop is an open-source implementation of Google …

[LIBRO][B] An architecture for fast and general data processing on large clusters

M Zaharia - 2016 - books.google.com
The past few years have seen a major change in computing systems, as growing data
volumes and stalling processor speeds require more and more applications to scale out to …

Efficient provable data possession for hybrid clouds

Y Zhu, H Wang, Z Hu, GJ Ahn, H Hu… - Proceedings of the 17th …, 2010 - dl.acm.org
Provable data possession is a technique for ensuring the integrity of data in outsourcing
storage service. In this paper, we propose a cooperative provable data possession scheme …

Moon: Mapreduce on opportunistic environments

H Lin, X Ma, J Archuleta, W Feng, M Gardner… - Proceedings of the 19th …, 2010 - dl.acm.org
MapReduce offers an ease-of-use programming paradigm for processing large data sets,
making it an attractive model for distributed volunteer computing systems. However, unlike …

A novel cost-effective dynamic data replication strategy for reliability in cloud data centres

W Li, Y Yang, D Yuan - 2011 IEEE Ninth International …, 2011 - ieeexplore.ieee.org
Nowadays, large-scale Cloud-based applications have put forward higher demand for
storage ability of data centres. Data in the Cloud need to be stored with high efficiency and …

Making cloud intermediate data fault-tolerant

SY Ko, I Hoque, B Cho, I Gupta - … of the 1st ACM symposium on Cloud …, 2010 - dl.acm.org
Parallel dataflow programs generate enormous amounts of distributed data that are short-
lived, yet are critical for completion of the job and for good run-time performance. We call this …