Big data analytics on Apache Spark

S Salloum, R Dautov, X Chen, PX Peng… - International Journal of …, 2016‏ - Springer
Apache Spark has emerged as the de facto framework for big data analytics with its
advanced in-memory programming model and upper-level libraries for scalable machine …

A survey on platforms for big data analytics

D Singh, CK Reddy - Journal of big data, 2015‏ - Springer
The primary purpose of this paper is to provide an in-depth analysis of different platforms
available for performing big data analytics. This paper surveys different hardware platforms …

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019‏ - usenix.org
With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

A survey of data partitioning and sampling methods to support big data analysis

MS Mahmud, JZ Huang, S Salloum… - Big Data Mining and …, 2020‏ - ieeexplore.ieee.org
Computer clusters with the shared-nothing architecture are the major computing platforms
for big data processing and analysis. In cluster computing, data partitioning and sampling …

Deepdb: Learn from data, not from queries!

B Hilprecht, A Schmidt, M Kulessa, A Molina… - arxiv preprint arxiv …, 2019‏ - arxiv.org
The typical approach for learned DBMS components is to capture the behavior by running a
representative set of queries and use the observations to train a machine learning model …

Videoedge: Processing camera streams using hierarchical clusters

CC Hung, G Ananthanarayanan… - 2018 IEEE/ACM …, 2018‏ - ieeexplore.ieee.org
Organizations deploy a hierarchy of clusters-cameras, private clusters, public clouds-for
analyzing live video feeds from their cameras. Video analytics queries have many …

Live video analytics at scale with approximation and {Delay-Tolerance}

H Zhang, G Ananthanarayanan, P Bodik… - … USENIX Symposium on …, 2017‏ - usenix.org
Video cameras are pervasively deployed for security and smart city scenarios, with millions
of them in large cities worldwide. Achieving the potential of these cameras requires …

Awstream: Adaptive wide-area streaming analytics

B Zhang, X **, S Ratnasamy, J Wawrzynek… - Proceedings of the 2018 …, 2018‏ - dl.acm.org
The emerging class of wide-area streaming analytics faces the challenge of scarce and
variable WAN bandwidth. Non-adaptive applications built with TCP or UDP suffer from …

Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018‏ - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

Low latency geo-distributed data analytics

Q Pu, G Ananthanarayanan, P Bodik… - ACM SIGCOMM …, 2015‏ - dl.acm.org
Low latency analytics on geographically distributed datasets (across datacenters, edge
clusters) is an upcoming and increasingly important challenge. The dominant approach of …