The family of mapreduce and large-scale data processing systems

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

Security and privacy aspects in MapReduce on clouds: A survey

P Derbeko, S Dolev, E Gudes, S Sharma - Computer science review, 2016 - Elsevier
MapReduce is a programming system for distributed processing of large-scale data in an
efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is …

Distributed graphlab: A framework for machine learning in the cloud

Y Low, J Gonzalez, A Kyrola, D Bickson… - arxiv preprint arxiv …, 2012 - arxiv.org
While high-level data parallel frameworks, like MapReduce, simplify the design and
implementation of large-scale data processing systems, they do not naturally or efficiently …

Scalability! but at what {COST}?

F McSherry, M Isard, DG Murray - 15th Workshop on Hot Topics in …, 2015 - usenix.org
We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a
Single Thread. The COST of a given platform for a given problem is the hardware …

Scalable k-means++

B Bahmani, B Moseley, A Vattani, R Kumar… - arxiv preprint arxiv …, 2012 - arxiv.org
Over half a century old and showing no signs of aging, k-means remains one of the most
popular data processing algorithms. As is well-known, a proper initialization of k-means is …

Streaming submodular maximization: Massive data summarization on the fly

A Badanidiyuru, B Mirzasoleiman, A Karbasi… - Proceedings of the 20th …, 2014 - dl.acm.org
How can one summarize a massive data set" on the fly", ie, without even having seen it in its
entirety? In this paper, we address the problem of extracting representative elements from a …

Massively parallel computation: Algorithms and applications

S Im, R Kumar, S Lattanzi, B Moseley… - … and Trends® in …, 2023 - nowpublishers.com
The algorithms community has been modeling the underlying key features and constraints of
massively parallel frameworks and using these models to discover new algorithmic …

Distributed submodular maximization: Identifying representative elements in massive data

B Mirzasoleiman, A Karbasi… - Advances in Neural …, 2013 - proceedings.neurips.cc
Many large-scale machine learning problems (such as clustering, non-parametric learning,
kernel machines, etc.) require selecting, out of a massive data set, a manageable …

Analyzing graph structure via linear measurements

KJ Ahn, S Guha, A McGregor - Proceedings of the twenty-third annual ACM …, 2012 - SIAM
We initiate the study of graph sketching, ie, algorithms that use a limited number of linear
measurements of a graph to determine the properties of the graph. While a graph on n …

Fast greedy algorithms in mapreduce and streaming

R Kumar, B Moseley, S Vassilvitskii… - ACM Transactions on …, 2015 - dl.acm.org
Greedy algorithms are practitioners' best friends—they are intuitive, are simple to implement,
and often lead to very good solutions. However, implementing greedy algorithms in a …