Data management in machine learning: Challenges, techniques, and systems
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …
advanced analytics, underpins many modern data-driven applications. The data …
Systemml: Declarative machine learning on spark
The rising need for custom machine learning (ML) algorithms and the growing data sizes
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …
Optimization of complex dataflows with user-defined functions
In many fields, recent years have brought a sharp rise in the size of the data to be analyzed
and the complexity of the analysis to be performed. Such analyses are often described as …
and the complexity of the analysis to be performed. Such analyses are often described as …
The nebulastream platform: Data and application management for the internet of things
The Internet of Things (IoT) presents a novel computing architecture for data management: a
distributed, highly dynamic, and heterogeneous environment of massive scale. Applications …
distributed, highly dynamic, and heterogeneous environment of massive scale. Applications …
Evaluating end-to-end optimization for data analytics applications in weld
Modern analytics applications use a diverse mix of libraries and functions. Unfortunately,
there is no optimization across these libraries, resulting in performance penalties as high as …
there is no optimization across these libraries, resulting in performance penalties as high as …
Ease. ml: Towards multi-tenant resource sharing for machine learning workloads
We present ease. ml, a declarative machine learning service platform. With ease. ml, a user
defines the high-level schema of an ML application and submits the task via a Web interface …
defines the high-level schema of an ML application and submits the task via a Web interface …
A survey of state management in big data processing systems
The concept of state and its applications vary widely across big data processing systems.
This is evident in both the research literature and existing systems, such as Apache Flink …
This is evident in both the research literature and existing systems, such as Apache Flink …
Babelfish: Efficient execution of polyglot queries
Today's users of data processing systems come from different domains, have different levels
of expertise, and prefer different programming languages. As a result, analytical workload …
of expertise, and prefer different programming languages. As a result, analytical workload …
An intermediate representation for optimizing machine learning pipelines
Machine learning (ML) pipelines for model training and validation typically include
preprocessing, such as data cleaning and feature engineering, prior to training an ML …
preprocessing, such as data cleaning and feature engineering, prior to training an ML …
On optimizing operator fusion plans for large-scale machine learning in systemml
Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by
means of linear algebra programs, and then automatically generate efficient execution …
means of linear algebra programs, and then automatically generate efficient execution …