Big data systems meet machine learning challenges: towards big data science as a service

R Elshawi, S Sakr, D Talia, P Trunfio - Big data research, 2018 - Elsevier
Recently, we have been witnessing huge advancements in the scale of data we routinely
generate and collect in pretty much everything we do, as well as our ability to exploit modern …

A survey on spatio-temporal data analytics systems

MM Alam, L Torgo, A Bifet - ACM Computing Surveys, 2022 - dl.acm.org
Due to the surge of spatio-temporal data volume, the popularity of location-based services
and applications, and the importance of extracted knowledge from spatio-temporal data to …

Apache spark: a unified engine for big data processing

M Zaharia, RS **n, P Wendell, T Das… - Communications of the …, 2016 - dl.acm.org
Apache Spark: a unified engine for big data processing Page 1 56 COMMUNICATIONS OF THE
ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles DOI:10.1145/2934664 This …

Spark sql: Relational data processing in spark

M Armbrust, RS **n, C Lian, Y Huai, D Liu… - Proceedings of the …, 2015 - dl.acm.org
Spark SQL is a new module in Apache Spark that integrates relational processing with
Spark's functional programming API. Built on our experience with Shark, Spark SQL lets …

[PDF][PDF] Apache flink: Stream and batch processing in a single engine

P Carbone, A Katsifodimos, S Ewen, V Markl… - The Bulletin of the …, 2015 - diva-portal.org
Apache Flink 1 is an open-source system for processing streaming and batch data. Flink is
built on the philosophy that many classes of data processing applications, including real …

Presto: SQL on everything

R Sethi, M Traverso, D Sundstrom… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
Presto is an open source distributed query engine that supports much of the SQL analytics
workload at Facebook. Presto is designed to be adaptive, flexible, and extensible. It supports …

[책][B] Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems

M Kleppmann - 2017 - books.google.com
Data is at the center of many challenges in system design today. Difficult issues need to be
figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In …

Altruistic scheduling in {Multi-Resource} clusters

R Grandl, M Chowdhury, A Akella… - … USENIX symposium on …, 2016 - usenix.org
Given the well-known tradeoffs between fairness, performance, and efficiency, modern
cluster schedulers often prefer instantaneous fairness as their primary objective to ensure …

Apache tez: A unifying framework for modeling and building data processing applications

B Saha, H Shah, S Seth, G Vijayaraghavan… - Proceedings of the …, 2015 - dl.acm.org
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of
application engines that are building upon the YARN resource management layer. The open …

Mercury: Hybrid centralized and distributed scheduling in large shared clusters

K Karanasos, S Rao, C Curino, C Douglas… - 2015 USENIX Annual …, 2015 - usenix.org
Datacenter-scale computing for analytics workloads is increasingly common. High
operational costs force heterogeneous applications to share cluster resources for achieving …