Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017‏ - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

Systemml: Declarative machine learning on spark

M Boehm, MW Dusenberry, D Eriksson… - Proceedings of the …, 2016‏ - dl.acm.org
The rising need for custom machine learning (ML) algorithms and the growing data sizes
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …

Optimization of complex dataflows with user-defined functions

A Rheinländer, U Leser, G Graefe - ACM Computing Surveys (CSUR), 2017‏ - dl.acm.org
In many fields, recent years have brought a sharp rise in the size of the data to be analyzed
and the complexity of the analysis to be performed. Such analyses are often described as …

The nebulastream platform: Data and application management for the internet of things

S Zeuch, A Chaudhary, B Del Monte… - arxiv preprint arxiv …, 2019‏ - arxiv.org
The Internet of Things (IoT) presents a novel computing architecture for data management: a
distributed, highly dynamic, and heterogeneous environment of massive scale. Applications …

Evaluating end-to-end optimization for data analytics applications in weld

S Palkar, J Thomas, D Narayanan, P Thaker… - Proceedings of the …, 2018‏ - dl.acm.org
Modern analytics applications use a diverse mix of libraries and functions. Unfortunately,
there is no optimization across these libraries, resulting in performance penalties as high as …

Ease. ml: Towards multi-tenant resource sharing for machine learning workloads

T Li, J Zhong, J Liu, W Wu, C Zhang - Proceedings of the VLDB …, 2018‏ - dl.acm.org
We present ease. ml, a declarative machine learning service platform. With ease. ml, a user
defines the high-level schema of an ML application and submits the task via a Web interface …

A survey of state management in big data processing systems

QC To, J Soto, V Markl - The VLDB Journal, 2018‏ - Springer
The concept of state and its applications vary widely across big data processing systems.
This is evident in both the research literature and existing systems, such as Apache Flink …

Babelfish: Efficient execution of polyglot queries

PM Grulich, S Zeuch, V Markl - Proceedings of the VLDB Endowment, 2021‏ - dl.acm.org
Today's users of data processing systems come from different domains, have different levels
of expertise, and prefer different programming languages. As a result, analytical workload …

An intermediate representation for optimizing machine learning pipelines

A Kunft, A Katsifodimos, S Schelter, S Breß… - Proceedings of the …, 2019‏ - dl.acm.org
Machine learning (ML) pipelines for model training and validation typically include
preprocessing, such as data cleaning and feature engineering, prior to training an ML …

On optimizing operator fusion plans for large-scale machine learning in systemml

M Boehm, B Reinwald, D Hutchison… - arxiv preprint arxiv …, 2018‏ - arxiv.org
Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by
means of linear algebra programs, and then automatically generate efficient execution …