The family of mapreduce and large-scale data processing systems
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …
overwhelming flow of data which has called for a paradigm shift in the computing …
Security and privacy aspects in MapReduce on clouds: A survey
MapReduce is a programming system for distributed processing of large-scale data in an
efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is …
efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is …
Distributed graphlab: A framework for machine learning in the cloud
While high-level data parallel frameworks, like MapReduce, simplify the design and
implementation of large-scale data processing systems, they do not naturally or efficiently …
implementation of large-scale data processing systems, they do not naturally or efficiently …
Scalability! but at what {COST}?
We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a
Single Thread. The COST of a given platform for a given problem is the hardware …
Single Thread. The COST of a given platform for a given problem is the hardware …
Scalable k-means++
Over half a century old and showing no signs of aging, k-means remains one of the most
popular data processing algorithms. As is well-known, a proper initialization of k-means is …
popular data processing algorithms. As is well-known, a proper initialization of k-means is …
Streaming submodular maximization: Massive data summarization on the fly
How can one summarize a massive data set" on the fly", ie, without even having seen it in its
entirety? In this paper, we address the problem of extracting representative elements from a …
entirety? In this paper, we address the problem of extracting representative elements from a …
Massively parallel computation: Algorithms and applications
The algorithms community has been modeling the underlying key features and constraints of
massively parallel frameworks and using these models to discover new algorithmic …
massively parallel frameworks and using these models to discover new algorithmic …
Distributed submodular maximization: Identifying representative elements in massive data
B Mirzasoleiman, A Karbasi… - Advances in Neural …, 2013 - proceedings.neurips.cc
Many large-scale machine learning problems (such as clustering, non-parametric learning,
kernel machines, etc.) require selecting, out of a massive data set, a manageable …
kernel machines, etc.) require selecting, out of a massive data set, a manageable …
Analyzing graph structure via linear measurements
KJ Ahn, S Guha, A McGregor - Proceedings of the twenty-third annual ACM …, 2012 - SIAM
We initiate the study of graph sketching, ie, algorithms that use a limited number of linear
measurements of a graph to determine the properties of the graph. While a graph on n …
measurements of a graph to determine the properties of the graph. While a graph on n …
Fast greedy algorithms in mapreduce and streaming
Greedy algorithms are practitioners' best friends—they are intuitive, are simple to implement,
and often lead to very good solutions. However, implementing greedy algorithms in a …
and often lead to very good solutions. However, implementing greedy algorithms in a …